Introduction

Welcome to the BEHAV3D Tumor Profiler Demo. Here you can find a demo example of how the Behaved Tumor Profiler pipeline works and what each of the outputs can provide.
Tu run the pipeline for yourself, please refer to the Github page, where you can find a wider explanation of the pipeline, as well as the Google Colab and R script versions.

Open Colab notebook

Firstly all the necessary libraries must be installed

Which libraries do I need?


library(dplyr)
library(dtwclust)
library(e1071)
library(emmeans)
library(ggplot2)
library(gtools)
library(parallel)
library(pheatmap)
library(plotly)
library(plyr)
library(RANN)
library(readr)
library(reshape2)
library(scales)
library(sp)
library(spatstat)
library(stats)
library(tidyr)
library(umap)
library(viridis)
library(zoo)

We set a seed for reproducibility:

set.seed(123)

❔ What data can I use?

Input data can be obtained from image analysis software like Imaris (Oxford Instruments), where you extract the relevant features for each cell types. Each cell type must be inputted separately into the pipeline.

Input data must follow the following format: Mouse_CellType_timelapse_LargeScaleRegion_Feature.csv

Our analyzed Cell types
Tumor Cells SR101 MG (Microglia) BV (Blood Vessel)

Example: 2430F13_SR101_timelapse_CL3_2_Distance.csv

Mouse information includes: 2430 -> Mother | F -> Sex | 13 -> Day

LargeScaleRegion information includes: CL2 -> Class (Environmental Cluster) | 2430F13.CL2_2 or just CL2_2 -> Position

These are the features to be uploaded:

  • Displacement
  • Distance_tumor
  • Displacement_Delta_Length
  • Displacement_Length
  • Position
  • Speed
  • Time
How does input data look like?


The data is outputed by softwares like Imaris in a .csv format.

For example, a Tumor cells 2430F11 CL1_1 displacement file would look like this (Each row represents a single cell track):

And a SR101 2430F16 CL1_1 Position file would look like this:

📂 Importing data

There are two ways to start or continue your analysis in BEHAV3D Tumor Profiler:

1. You can import your data and start your analysis from the beginning

Data import


We need to import the dataset and classifying the information into distinct dataframes
Create a reading function, that reads input from Imaris software:

## function to import the stats of the tracked tumor cells data from csv files extracted by
## Imaris
read_plus <- function(flnm) {
    read_csv(flnm, skip = 3, col_types = cols()) %>%
        mutate(filename = flnm)
}


Now we can import the datafiles of interest. We are basically extracting the information from .csv files and creating the corresponding dataframes

Tumor cells information


## set directory where csv files are located
library(plyr)
working_directory <- "D:/BEHAV3D_Tumor_Profiler/data/Tracked_tumor_cells"
setwd(working_directory)
# import volume per organoid object
pat = "*Displacement"
files <- list.files(path = working_directory, pattern = pat, full.names = T, recursive = TRUE)
files <- files[seq(1, length(files), 3)]
disp_csv <- ldply(files, read_plus)
# import distance to tumor
pat = "*Distance_tumor"
files <- list.files(path = working_directory, pattern = pat, full.names = T, recursive = TRUE)
dist_tumor_csv <- ldply(files, read_plus)
# import area per organoid object
pat = "*Displacement_Delta_Length"
files <- list.files(path = working_directory, pattern = pat, full.names = T, recursive = TRUE)
dipl_delta_csv <- ldply(files, read_plus)

# import area per organoid object
pat = "*Displacement_Length"
files <- list.files(path = working_directory, pattern = pat, full.names = T, recursive = TRUE)
dipl_length_csv <- ldply(files, read_plus)
# import position per organoid object
pat = "*Position"
files <- list.files(path = working_directory, pattern = pat, full.names = T, recursive = TRUE)
pos_csv <- ldply(files, read_plus)

# import speed
pat = "*Speed"
files <- list.files(path = working_directory, pattern = pat, full.names = T, recursive = TRUE)
speed_csv <- ldply(files, read_plus)

# import Time
pat = "*Time"
files <- list.files(path = working_directory, pattern = pat, full.names = T, recursive = TRUE)
time_csv <- ldply(files, read_plus)

### ## make each ID unique (IDs imported from different imaging files can be repeated):
category <- as.factor(dist_tumor_csv$filename)
ranks <- rank(-table(category), ties.method = "first")
ranks <- as.data.frame(ranks)
ranks$filename <- row.names(ranks)
dist_tumor_csv <- left_join(dist_tumor_csv, ranks)
dist_tumor_csv$ID2 <- with(dist_tumor_csv, interaction(ID, ranks))

category <- as.factor(pos_csv$filename)
ranks <- rank(-table(category), ties.method = "first")
ranks <- as.data.frame(ranks)
ranks$filename <- row.names(ranks)
pos_csv <- left_join(pos_csv, ranks)
pos_csv$ID2 <- with(pos_csv, interaction(ID, ranks))

category <- as.factor(time_csv$filename)
ranks <- rank(-table(category), ties.method = "first")
ranks <- as.data.frame(ranks)
ranks$filename <- row.names(ranks)
time_csv <- left_join(time_csv, ranks)
time_csv$ID2 <- with(time_csv, interaction(ID, ranks))

category <- as.factor(speed_csv$filename)
ranks <- rank(-table(category), ties.method = "first")
ranks <- as.data.frame(ranks)
ranks$filename <- row.names(ranks)
speed_csv <- left_join(speed_csv, ranks)
speed_csv$ID2 <- with(speed_csv, interaction(ID, ranks))

category <- as.factor(dipl_length_csv$filename)  ##this measures how much distnace did a cell move over, a cell can mover over a big distance and stay at the same position
ranks <- rank(-table(category), ties.method = "first")
ranks <- as.data.frame(ranks)
ranks$filename <- row.names(ranks)
dipl_length_csv <- left_join(dipl_length_csv, ranks)
dipl_length_csv$ID2 <- with(dipl_length_csv, interaction(ID, ranks))

category <- as.factor(dipl_delta_csv$filename)  ## this variable measures how much distance did the cell move from its origin
ranks <- rank(-table(category), ties.method = "first")
ranks <- as.data.frame(ranks)
ranks$filename <- row.names(ranks)
dipl_delta_csv <- left_join(dipl_delta_csv, ranks)
dipl_delta_csv$ID2 <- with(dipl_delta_csv, interaction(ID, ranks))

category <- as.factor(disp_csv$filename)  ## a bit similar to the previous ones, this measures the area that a cell covers overtime
ranks <- rank(-table(category), ties.method = "first")
ranks <- as.data.frame(ranks)
ranks$filename <- row.names(ranks)
disp_csv <- left_join(disp_csv, ranks)
disp_csv$ID2 <- with(disp_csv, interaction(ID, ranks))



Blood vessel (BV) information


Firstly, we create the dataframe containing the statistics of interest 

detach(package:plyr)

## create dataframe containing the statistics of interest
master_1 <- left_join(dist_tumor_csv[, c(1, 13)], time_csv[, c(1, 5, 6, 8, 10, 11)], by = c(ID2 = "ID2"))
master_1 <- left_join(master_1, disp_csv[, c(1, 11)], by = c(ID2 = "ID2"))
master_1 <- left_join(master_1, dipl_delta_csv[, c(1, 11)], by = c(ID2 = "ID2"))
master_1 <- left_join(master_1, dipl_length_csv[, c(1, 11)], by = c(ID2 = "ID2"))
master_1 <- left_join(master_1, speed_csv[, c(1, 11)], by = c(ID2 = "ID2"))
master <- left_join(master_1, pos_csv[, c(1, 2, 3, 14)], by = c(ID2 = "ID2"))

colnames(master) <- c("dist_tumor", "ID2", "Time", "Tracked", "TrackID", "filename", "ranks", "disp2",
    "disp_d", "disp_l", "speed", "x", "y", "z")

# Extract mouse data (ID) using regexp (mother+sex+day)
master$mouse <- master$filename
master$mouse <- gsub("^.*[_/](\\d*[MF]\\d+).*", "\\1", master$mouse, perl = TRUE)

# Extract position metadata
master$position <- master$filename
master$position <- gsub(".*_(CL\\d+_\\d+)_.*", "\\1", master$position, perl = TRUE)
master$position <- with(master, interaction(mouse, position))


# Extract class metadata
master$class <- master$position
master$class <- gsub(".*(CL\\d+)_.*", "\\1", master$class, perl = TRUE)

master$filename <- NULL
## Round the time
master$Time <- round(master$Time, digits = 2)


## check that the imported tracks look ok
g <- ggplot(master) + geom_path(aes(x, y, group = TrackID), size = 0.2, arrow = arrow(ends = "last",
    type = "open", length = unit(0.01, "inches"))) + theme_bw() + theme(aspect.ratio = 1) + facet_wrap(class ~
    position, scales = "free")

g

, the Blood Vessel information is extracted:

### Import BLOOD VESSELS stats (does not exist for all tracked positions so needs to be
### processed separtely.)
working_directory2 <- "D:/BEHAV3D_Tumor_Profiler/data/BV_stats"
setwd(working_directory2)

files <- list.files(path = working_directory2, full.names = T, recursive = TRUE)
BV_df <- ldply(files, read_plus)

### Process blood vessels data:
BV_df <- BV_df[c(1, 6, 7, 8, 9, 11)]
colnames(BV_df) <- c("dist_BV", "Time", "Tracked", "TrackID", "ID", "filename")

## rename mouse accroding to filename:
BV_df$mouse <- BV_df$filename
BV_df$mouse <- gsub("^.*[_/](\\d*[MF]\\d+).*", "\\1", BV_df$mouse, perl = TRUE)

BV_df$position <- BV_df$filename
BV_df$position <- gsub(".*_(CL\\d+_\\d+)_.*", "\\1", BV_df$position, perl = TRUE)
BV_df$position <- with(BV_df, interaction(mouse, position))

BV_df$class <- BV_df$position
BV_df$class <- gsub(".*(CL\\d+)_.*", "\\1", BV_df$class, perl = TRUE)


## create ranks column based on master ranks:
ranks_master <- subset(master, position %in% BV_df$position)
ranks_master <- ranks_master[!duplicated(ranks_master$position), c(6, 15)]
colnames(ranks_master)[1] <- "ranks_bv"
BV_df <- left_join(BV_df, ranks_master)

## keep only tracked cells and with a TrackID:
BV_df <- subset(BV_df, Tracked == "Tracked")
## remove any NA:
BV_df <- na.omit(BV_df)

## create unique Track_ID for Blood vessels dataframe:
BV_df$Track2 <- with(BV_df, interaction(ranks_bv, TrackID))
BV_df$Track2 <- gsub(".", "", BV_df$Track2, fixed = T)
BV_df$Track2 <- as.numeric(as.character(BV_df$Track2))
## sumarize per TrackID the average, min and max distance to blood vessels
ggplot(BV_df, aes(dist_BV)) + geom_histogram() + facet_wrap(. ~ position)
BV_df <- na.omit(BV_df)
BV_df$BV_contact <- ifelse(BV_df$dist_BV < 4, 0, 1)
BV_df_sum <- BV_df %>%
    group_by(Track2) %>%
    summarise(BV_sd = sd(dist_BV)/sqrt(length(dist_BV)), BV_mean = mean(dist_BV), BV_min = min(dist_BV),
        BV_max = max(dist_BV), BV_contact = mean(BV_contact))


SR101 information


library(plyr)

## import Sr101 cells for the populations that have them

working_directory <- "D:/BEHAV3D_Tumor_Profiler/data/SR101_objects"
setwd(working_directory)
# import volume per organoid object

# import position per organoid object
pat = "*Position"
files <- list.files(path = working_directory, pattern = pat, full.names = T, recursive = TRUE)
pos_csv <- ldply(files, read_plus)


# import Time
pat = "*Time"
files <- list.files(path = working_directory, pattern = pat, full.names = T, recursive = TRUE)
time_csv <- ldply(files, read_plus)

### ## make each TrackID unique (TrackIDs imported from different imaging files can be
### repeated):

category <- as.factor(pos_csv$filename)
ranks <- rank(-table(category), ties.method = "first")
ranks <- as.data.frame(ranks)
ranks$filename <- row.names(ranks)
pos_csv <- left_join(pos_csv, ranks)
pos_csv$ID2 <- with(pos_csv, interaction(ID, ranks))

category <- as.factor(time_csv$filename)
ranks <- rank(-table(category), ties.method = "first")
ranks <- as.data.frame(ranks)
ranks$filename <- row.names(ranks)
time_csv <- left_join(time_csv, ranks)
time_csv$ID2 <- with(time_csv, interaction(ID, ranks))

detach(package:plyr)

## create dataframe containing the statistics of interest
master_SR101 <- left_join(pos_csv[, c(1, 2, 3, 10, 12)], time_csv[, c(1, 9)], by = c(ID2 = "ID2"))


colnames(master_SR101) <- c("x", "y", "z", "filename", "ID2", "Time")

## rename mouse accroding to filename: Extract mouse data (ID) using regexp (mother+sex+day)
master_SR101$mouse <- master_SR101$filename
master_SR101$mouse <- gsub("^.*[_/](\\d*[MF]\\d+).*", "\\1", master_SR101$mouse, perl = TRUE)

# Extract position metadata
master_SR101$position <- master_SR101$filename
master_SR101$position <- gsub(".*_(CL\\d+_\\d+)_.*", "\\1", master_SR101$position, perl = TRUE)
master_SR101$position <- with(master_SR101, interaction(mouse, position))

# Extract class metadata
master_SR101$class <- master_SR101$position
master_SR101$class <- gsub(".*(CL\\d+)_.*", "\\1", master_SR101$class, perl = TRUE)

master_SR101$filename <- NULL

## Convert imported time to hours:
master_SR101$Time <- round(master_SR101$Time, digits = 2)



Microglia (MG) information


## import MG cells for the populations that have them
library(plyr)
working_directory <- "D:/BEHAV3D_Tumor_Profiler/data/MG_objects"
setwd(working_directory)

# import position per organoid object
pat = "*Position"
files <- list.files(path = working_directory, pattern = pat, full.names = T, recursive = TRUE)
pos_csv <- ldply(files, read_plus)


# import Time
pat = "*Time"
files <- list.files(path = working_directory, pattern = pat, full.names = T, recursive = TRUE)
time_csv <- ldply(files, read_plus)

### ## make each TrackID unique (TrackIDs imported from different imaging files can be
### repeated):

category <- as.factor(pos_csv$filename)
ranks <- rank(-table(category), ties.method = "first")
ranks <- as.data.frame(ranks)
ranks$filename <- row.names(ranks)
pos_csv <- left_join(pos_csv, ranks)
pos_csv$ID2 <- with(pos_csv, interaction(ID, ranks))

category <- as.factor(time_csv$filename)
ranks <- rank(-table(category), ties.method = "first")
ranks <- as.data.frame(ranks)
ranks$filename <- row.names(ranks)
time_csv <- left_join(time_csv, ranks)
time_csv$ID2 <- with(time_csv, interaction(ID, ranks))

detach(package:plyr)

## create dataframe containing the statistics of interest
master_MG <- left_join(pos_csv[, c(1, 2, 3, 10, 12)], time_csv[, c(1, 9)], by = c(ID2 = "ID2"))


colnames(master_MG) <- c("x", "y", "z", "filename", "ID2", "Time")

## rename mouse accroding to filename: Extract mouse data (ID) using regexp (mother+sex+day)

master_MG$mouse <- master_MG$filename
master_MG$mouse <- gsub("^.*[_/](\\d*[MF]\\d+).*", "\\1", master_MG$mouse, perl = TRUE)

# Extract position metadata
master_MG$position <- master_MG$filename
master_MG$position <- gsub(".*_(CL\\d+_\\d+)_.*", "\\1", master_MG$position, perl = TRUE)
master_MG$position <- with(master_MG, interaction(mouse, position))

# Extract class metadata
master_MG$class <- master_MG$position
master_MG$class <- gsub(".*(CL\\d+)_.*", "\\1", master_MG$class, perl = TRUE)

master_MG$filename <- NULL

## Convert imported time to hours:
master_MG$Time <- round(master_MG$Time, digits = 2)


Pre-processing

Now we need to join the information from all of the sources into a single dataframe

Interactions between cells


We first calulate the interactions between the nearest cells:

### test that tracks imported are fine:
master_test <- master %>%
    group_by(position) %>%
    mutate(x = x - min(x), y = y - min(y))
g <- ggplot(master) + geom_path(aes(x, y, group = TrackID, colour = dist_tumor), arrow = arrow(ends = "last",
    type = "closed", length = unit(0.005, "inches"))) + theme_bw()
g <- g + theme(legend.position = "none") + facet_wrap(. ~ position, scales = "free")
g

## To calculate interactions between cells

List2 = list()
for (m in unique(master$position)) {
    distance_M4 <- master[which(master$position == m), ]

    List = list()
    for (i in unique(distance_M4$Time)) {
        distancei <- distance_M4[distance_M4$Time == i, ]
        distancei2 <- as.data.frame(distancei[, c(3, 11, 12, 13)])
        coordin <- ppx(distancei2, coord.type = c("t", "s", "s", "s"))
        dist1 <- nndist(coordin)
        dist2 <- nndist(coordin, k = 2)
        dist3 <- nndist(coordin, k = 3)
        dist4 <- nndist(coordin, k = 4)
        dist5 <- nndist(coordin, k = 5)
        dist6 <- nndist(coordin, k = 6)
        dist7 <- nndist(coordin, k = 7)
        dist8 <- nndist(coordin, k = 8)
        dist9 <- nndist(coordin, k = 9)
        dist10 <- nndist(coordin, k = 10)
        dist <- cbind(dist1, dist2, dist3, dist4, dist5, dist6, dist7, dist8, dist9, dist10)  ### calculate distance to the three nearest neighbors
        dist <- data.frame(rowMeans(dist[, c(1:3)]), rowMeans(dist))
        distanceDFi_dist <- cbind(distancei, dist)
        List[[length(List) + 1]] <- distanceDFi_dist  ## store to list object
    }
    master_dist <- do.call(rbind, List)  ## convert List to dta
    List2[[length(List2) + 1]] <- master_dist
}
master_distance <- do.call(rbind, List2)
colnames(master_distance)[c(17, 18)] <- c("dist_3_neigh", "dist_10_neigh")

### remove the untracked tracks:
master_distance <- subset(master_distance, Tracked == "Tracked")

## create a unique track_ID for master:
master_distance$Track2 <- with(master_distance, interaction(ranks, TrackID))
master_distance$Track2 <- gsub(".", "", master_distance$Track2, fixed = T)
master_distance$Track2 <- as.numeric(as.character(master_distance$Track2))


## check that the imported tracks look ok
g <- ggplot(master_distance) + geom_path(aes(x, y, group = Track2), size = 0.2, arrow = arrow(ends = "last",
    type = "open", length = unit(0.01, "inches"))) + theme_bw() + theme(aspect.ratio = 1) + facet_wrap(class ~
    position, scales = "free")

g

For the SR101 and MG datasets, we separately find the distance to tumor cells

For SR101


## For SR101
master_distance_SR101 <- subset(master_distance, mouse %in% unique(master_SR101$mouse))
### Calculate distance from Tumor cells to SR101.  Considering that olig2 doens;t move much,
### let's project the position of the birghtest timepoint from each movie to all timepoints.
### Identify birghtest timepoint by the number of SR101 objects:
brightest_SR101 <- master_SR101 %>%
    group_by(Time, position) %>%
    summarise(n = n()) %>%
    group_by(position) %>%
    filter(n == max(n))
master_SR101 <- subset(master_SR101, Time %in% brightest_SR101$Time)

List2 <- list()
for (m in unique(master_distance_SR101$position)) {
    master_pos <- master_distance_SR101 %>%
        filter(position == m) %>%
        group_by(Time) %>%
        arrange(Time)
    master_SR101_pos <- master_SR101 %>%
        filter(position == m) %>%
        group_by(Time) %>%
        arrange(Time)
    List1 = list()  ## create list for ID of the contacting objects
    for (t in unique(master_pos$Time)) {
        master_pos_t <- master_pos %>%
            filter(Time == t)
        ### calculate distance to 5 neigrest neighorbing SR101 in a radius of 40um
        cells_radius <- nn2(data = master_SR101_pos[c("x", "y", "z")], query = master_pos_t[c("x",
            "y", "z")], k = 30, treetype = "bd", searchtype = "radius", radius = 30)
        cells_min <- nn2(data = master_SR101_pos[c("x", "y", "z")], query = master_pos_t[c("x",
            "y", "z")], k = 1, treetype = "bd", searchtype = "standard")

        dist_table = data.frame(master_pos_t[c("Time", "Track2", "mouse", "position", "class")],
            cells_min[["nn.dists"]], cells_radius[["nn.dists"]])
        List1[[length(List1) + 1]] <- dist_table
        rm(temp)
    }
    cell_radius_t <- do.call(rbind, List1)
    List2[[length(List2) + 1]] <- cell_radius_t
    rm(cell_radius_t)
}

master_dist_SR101 <- do.call(rbind, List2)
### convert the distant cells to 0

temp2 <- master_dist_SR101[c("X1", "X2", "X3", "X4", "X5", "X6", "X7", "X8", "X9", "X10", "X11",
    "X12", "X13", "X14", "X15", "X16", "X17", "X18", "X19", "X20", "X21", "X22", "X23", "X24", "X25",
    "X26", "X27", "X28", "X29", "X30")]
temp2[temp2 > 1.340781e+153] <- 0

### Calculate number of contacts:
master_dist_SR101$n_SR101 <- rowSums(temp2 != 0)
### renamen min_SR101:
colnames(master_dist_SR101)[colnames(master_dist_SR101) == "cells_min...nn.dists..."] <- "min_SR101"

### bind information on SR101 distnaces to the main dataframe:
master_dist_SR101[c("X1", "X2", "X3", "X4", "X5", "X6", "X7", "X8", "X9", "X10", "X11", "X12", "X13",
    "X14", "X15", "X16", "X17", "X18", "X19", "X20", "X21", "X22", "X23", "X24", "X25", "X26", "X27",
    "X28", "X29", "X30")] <- NULL
## For each Track2 find what is the minimal distance to SR101 and the max n of SR101
## neiighbors in 30um diameter:
master_distance_SR101 <- master_dist_SR101 %>%
    group_by(Track2, mouse, position, class) %>%
    summarize(n_SR101 = max(n_SR101), min_SR101 = min(min_SR101))
### plot per position types the values of
p <- ggplot(master_distance_SR101) + geom_jitter(aes(x = class, y = n_SR101)) + geom_bar(aes(x = class,
    y = n_SR101), alpha = 0.5, stat = "summary") + ggtitle("number of neighboring SR101 cells per position type") +
    theme_classic() + theme(aspect.ratio = 0.7)

p

p <- ggplot(master_distance_SR101) + geom_jitter(aes(x = class, y = min_SR101)) + geom_boxplot(aes(x = class,
    y = min_SR101), alpha = 0.5, outlier.colour = NA) + ggtitle("min distance to SR101 cells per position type") +
    theme_classic() + theme(aspect.ratio = 0.7)

p


For MG


## For MG
master_distance_MG <- subset(master_distance, mouse %in% unique(master_MG$mouse))
### Calculate distance from Tumor cells to MG.  Considering that olig2 doens;t move much,
### let's project the position of the birghtest timepoint from each movie to all timepoints.
### Identify birghtest timepoint by the number of MG objects:
brightest_MG <- master_MG %>%
    group_by(Time, position) %>%
    summarise(n = n()) %>%
    group_by(position) %>%
    filter(n == max(n))
master_MG <- subset(master_MG, Time %in% brightest_MG$Time)

List2 <- list()
for (m in unique(master_distance_MG$position)) {
    master_pos <- master_distance_MG %>%
        filter(position == m) %>%
        group_by(Time) %>%
        arrange(Time)
    master_MG_pos <- master_MG %>%
        filter(position == m) %>%
        group_by(Time) %>%
        arrange(Time)
    List1 = list()  ## create list for ID of the contacting objects
    for (t in unique(master_pos$Time)) {
        master_pos_t <- master_pos %>%
            filter(Time == t)
        ### calculate distance to 5 neigrest neighorbing MG in a radius of 30um
        cells_radius <- nn2(data = master_MG_pos[c("x", "y", "z")], query = master_pos_t[c("x",
            "y", "z")], k = 30, treetype = "bd", searchtype = "radius", radius = 30)
        cells_min <- nn2(data = master_MG_pos[c("x", "y", "z")], query = master_pos_t[c("x", "y",
            "z")], k = 1, treetype = "bd", searchtype = "standard")

        dist_table = data.frame(master_pos_t[c("Time", "Track2", "mouse", "position", "class")],
            cells_min[["nn.dists"]], cells_radius[["nn.dists"]])
        List1[[length(List1) + 1]] <- dist_table
        rm(temp)
    }
    cell_radius_t <- do.call(rbind, List1)
    List2[[length(List2) + 1]] <- cell_radius_t
    rm(cell_radius_t)
}

master_dist_MG <- do.call(rbind, List2)
### convert the distant cells to 0

temp2 <- master_dist_MG[c("X1", "X2", "X3", "X4", "X5", "X6", "X7", "X8", "X9", "X10", "X11", "X12",
    "X13", "X14", "X15", "X16", "X17", "X18", "X19", "X20", "X21", "X22", "X23", "X24", "X25", "X26",
    "X27", "X28", "X29", "X30")]
temp2[temp2 > 1.340781e+153] <- 0

### Calculate number of contacts:
master_dist_MG$n_MG <- rowSums(temp2 != 0)
### renamen min_MG:
colnames(master_dist_MG)[colnames(master_dist_MG) == "cells_min...nn.dists..."] <- "min_MG"

### bind information on MG distnaces to the main dataframe:
master_dist_MG[c("X1", "X2", "X3", "X4", "X5", "X6", "X7", "X8", "X9", "X10", "X11", "X12", "X13",
    "X14", "X15", "X16", "X17", "X18", "X19", "X20", "X21", "X22", "X23", "X24", "X25", "X26", "X27",
    "X28", "X29", "X30")] <- NULL
## For each Track2 find what is the minimal distance to MG and the max n of MG neiighbors in
## 30um diameter:
master_distance_MG <- master_dist_MG %>%
    group_by(Track2, mouse, position, class) %>%
    summarize(n_MG = max(n_MG), min_MG = min(min_MG))
### plot per position types the values of
p <- ggplot(master_distance_MG) + geom_jitter(aes(x = class, y = n_MG)) + geom_bar(aes(x = class,
    y = n_MG), alpha = 0.5, stat = "summary") + ggtitle("number of neighboring MG cells per position type") +
    theme_classic() + theme(aspect.ratio = 0.7)

p

p <- ggplot(master_distance_MG) + geom_jitter(aes(x = class, y = min_MG)) + geom_boxplot(aes(x = class,
    y = min_MG), alpha = 0.5, outlier.colour = NA) + ggtitle("min distance to MG cells per position type") +
    theme_classic() + theme(aspect.ratio = 0.7)

p

Time of interest selection and metadata info

Scroll for further detail

Time of interest selection and substitution of NA values:

### create empyt dataframe with combination of Time of interest:
max(master_distance$Time)
## [1] 5.44
length(seq(from = 0, to = 5.6, by = 0.2))
## [1] 29
empty_df = master_distance[1:29, ]  ### create a dataframe with the same size as the sequence of Time of interest:
empty_df[!is.na(empty_df)] <- NA  ## make it empyt
empty_df$Time <- seq(from = 0, to = 5.6, by = 0.2)
empty_df$Track2 <- 0.5


### now merge this fake dataframe with my dataframe of interest:
master_distance <- rbind(master_distance, empty_df)
### complete all the datasets so that they have the same timepoints:
master_distance <- master_distance %>%
    complete(Track2, Time)
### remove the fake track that is not necessary anymore:

master_distance <- subset(master_distance, !Track2 == 0.5)


## round the number, otherwise they are not equal
master_distance$Time <- round(master_distance$Time, 2)
## if there is any track2 and time that was duplicated remove the NA Assuming your dataframe
## is named your_dataframe
master_distance <- master_distance %>%
    group_by(Time, Track2) %>%
    dplyr::slice(if (n() == 1) 1 else which(!is.na(dist_tumor)))

### create a for loop to refill all the NA with interpolated values: create a list of the
### columns names that need to be refilled
column_names <- names(master_distance)
column_names <- subset(column_names, !column_names %in% c("TrackID", "Track2", "ranks", "ID", "ID2",
    "Time", "position", "mouse", "class", "pos", "Tracked", "speed"))

## create a first dataset with refilled values for speed:
time_series <- acast(master_distance, Time ~ Track2, value.var = "speed", fun.aggregate = mean)
## rownames timepoints:
row.names(time_series) <- unique(master_distance$Time)
## get rid of NA
time_series_zoo <- zoo(time_series, row.names(time_series))
time_series_zoo <- na.approx(time_series_zoo)  ## replace by interpolated value
time_series <- as.matrix(time_series_zoo)
time_series2 <- melt(time_series)
data <- time_series2[complete.cases(time_series2), ]
colnames(data) <- c("Time", "Track2", "speed")
### ----------
for (i in column_names) {
    time_series <- acast(master_distance, Time ~ Track2, value.var = i, fun.aggregate = mean)
    row.names(time_series) <- unique(master_distance$Time)
    ## get rid of NA
    time_series_zoo <- zoo(time_series, row.names(time_series))
    time_series_zoo <- na.approx(time_series_zoo)  ## replace by last value
    time_series <- as.matrix(time_series_zoo)
    time_series2 <- melt(time_series)
    new <- time_series2[complete.cases(time_series2), ]
    data[, ncol(data) + 1] <- new[3]  # Append new column
    colnames(data)[ncol(data)] <- paste0(i)

}
### check that the data is correct by plotting the evolution of a certain variable overtime
### -----
p1 <- ggplot(data, aes(Time, disp2, group = Track2, color = as.factor(Track2))) + geom_smooth(method = "loess",
    size = 0.5, se = F, alpha = 0.3, span = 1) + theme_bw() + theme(axis.text.x = element_text(size = 10),
    axis.text.y = element_text(size = 5), axis.title.y = element_text(size = 10), axis.title.x = element_text(size = 15),
    legend.text = element_text(size = 10)) + ggtitle("disp2") + theme(aspect.ratio = 1, legend.position = "none")

p1


Then, metadata information is added to the analysis:

### add metadata info

master_cor <- data
master_metadata <- master_distance[c("position", "mouse", "class", "Track2")]
master_metadata <- na.omit(master_metadata)
master_metadata <- master_metadata[!duplicated(master_metadata$Track2), ]

# detach(package:plyr, unload = TRUE)
master_cor <- left_join(master_cor, master_metadata)

### Filter timepoints of interest with same time interval:
time_of_interest <- as.character(seq(from = 0, to = 5.6, by = 0.2))  ### convert to character
master_cor$Time <- as.character(master_cor$Time)
master_cor <- subset(master_cor, Time %in% time_of_interest)
master_cor$Time <- as.numeric(master_cor$Time)  ### now back to numeric



## check that the imported tracks look ok
g <- ggplot(master_cor) + geom_path(aes(x, y, group = Track2), size = 0.2, arrow = arrow(ends = "last",
    type = "open", length = unit(0.01, "inches"))) + theme_bw() + theme(aspect.ratio = 1) + facet_wrap(class ~
    position, scales = "free")

g


Finally, the generated dataframes are exported to use as Checkpoints when repeating the analysis:

Set working directory to export dataframes

setwd("D:/BEHAV3D_Tumor_Profiler/results/rds")


All imported information:

saveRDS(master_distance, "master_distance.rds")


Time of interest subset information:

saveRDS(master_cor, "master_cor.rds")


Microglia information:

saveRDS(master_distance_MG, "master_distance_MG.rds")


SR101 information:

saveRDS(master_distance_SR101, "master_distance_SR101.rds")


Blood Vessel information:

saveRDS(BV_df_sum, "BV_df_sum.rds")


2. You can continue with a previously started analysis using the .rds files obtained.

If the RDS files are in the working directory, the BEHAV3D Tumor Profiler script will skip the data input and initial pre-processing, by using the provided .rds files.

setwd("D:/BEHAV3D_Tumor_Profiler/results")

master_cor <- readRDS("master_cor.rds")
master_distance <- readRDS("master_distance.rds")
master_distance_MG <- readRDS("master_distance_MG.rds")
master_distance_SR101 <- readRDS("master_distance_SR101.rds")
BV_df_sum <- readRDS("BV_df_sum.rds")
print("data already imported")
## [1] "data already imported"

💻 Joint data processing

After the .rds checkpoints, we can continue with joint data processing.

In this section, the joint dataframes are adjusted so all track have the same length

### sumarize the length of the tracks: create relative time
master_cor2 <- master_cor %>%
    group_by(Track2) %>%
    arrange(Time) %>%
    mutate(Time2 = Time - dplyr::first(Time))
ggplot(master_cor2, aes(Time2)) + geom_histogram() + facet_wrap(. ~ position)


This plot shows a summary of the imported tracks for each position and time

max_time <- master_cor2 %>%
    group_by(position) %>%
    summarise(max_Time = max(Time2))
min_time <- min(max_time$max_Time)  #this is the minimal time that any position has
hist(max_time$max_Time)

Here we can see the track length distribution, so we can normalize all tracks to the minimal common time:

master_cor2 <- master_cor2 %>%
    group_by(Track2) %>%
    filter(Time2 < min_time)  ##Minimal time for a position is 2.6
## keep only the tracks that are of the same length (max number of timepoints (13 in this
## case), this is how they were defined) Calculate the most frequent n for each group
subtrack_length <- master_cor2 %>%
    group_by(Track2, position) %>%
    summarise(n = n_distinct(Time2), first_t = min(Time2), last_t = max(Time2)) %>%
    ungroup()
hist(subtrack_length$n)


This histogram shows the track length across all tracks, so the most common track length can be selected

freq_n <- as.numeric(which.max(table(subtrack_length$n)))  # Most frequent n
subtrack_length <- subtrack_length %>%
    filter(n == freq_n)  # we need to automate this to get the most frequent n
# Now we only keep these subtracks that have the same length:
master_cor2 <- subset(master_cor2, Track2 %in% subtrack_length$Track2)


Plot the unique tracks for each position using the tumor distance as color label:

### create a parameter for distance to tumor core
master_cor2 <- master_cor2 %>%
    group_by(position) %>%
    mutate(dist_tumor_1 = scales::rescale(dist_tumor, to = c(0, 100)))

### normalize the distance to tumor factor per position position
master_cor2 <- master_cor2 %>%
    group_by(Track2) %>%
    arrange(Time2) %>%
    mutate(dist_tumor = dist_tumor - dplyr::first(dist_tumor)) %>%
    ungroup()

#### check tracks
master_M4_test <- master_cor2 %>%
    group_by(position, mouse) %>%
    mutate(x = x - min(x), y = y - min(y))
g <- ggplot(master_M4_test) + geom_path(aes(x, y, group = Track2, colour = dist_tumor), arrow = arrow(ends = "last",
    type = "closed", length = unit(0.005, "inches"))) + theme_bw()
g <- g + theme(legend.position = "none") + scale_colour_gradient2(low = "blue", mid = "grey", high = "red") +
    facet_wrap(. ~ position)
g


Number of unqiue tracks that are processed:

length(unique(master_cor2$Track2))
## [1] 981

Scaling

Now we need to do some feature engineering, since we make a sliding window analysis we need to requantify the values of displacement, speed, etc from scratch

## scaling
library(parallel)
library(dplyr)
library(dtwclust)
library(stats)
library(scales)
# detach(package:spatstat, unload = TRUE) normalize the data:
master_norm <- master_cor2 %>%
    ungroup()
column_names2 <- names(master_cor2)
## select what data to normalize
column_names2 <- subset(column_names2, !column_names2 %in% c("Time", "Track2", "x", "y", "z", "position",
    "mouse", "class", "Time2", "dist_tumor_1", "dist_3_neigh", "dist_10_neigh", "iteration", "Track2"))
master_norm <- as.data.frame(master_norm)

# transform skwed variables Calculate skewness for each variable
skew_values <- apply(master_norm[c(column_names2)], 2, skewness)

# Identify variables with skewness greater than a threshold (e.g., 0.5)
skewed_variables <- names(skew_values[abs(skew_values) > 2])
# Apply logarithmic transformation to skewed variables
master_norm[, skewed_variables] <- log1p(master_norm[, skewed_variables])

PCA analysis

We performa a PCA dimensionality reduction to prepare for the multivariate analysis

### perfrom PCA on the factors I want to use
master_scaled <- master_norm[c(column_names2)] %>%
    mutate(across(everything(), scale)) %>%
    ungroup()  ##first scale
## now to give more weight to the varibale of dist_tumor so the ability of cells to leave
## master_scaled[,c('dist_tumor')]<-master_scaled[,c('dist_tumor')]*2

master_pca <- prcomp(master_norm[c(column_names2)], scale = T)

master_pca2 <- master_pca[["x"]]  ##select only PC
master_pca3 <- data.frame(master_norm[, c(2)], master_pca2[, c(1:3)])  ### take only first 3 components since they explain most variation

colnames(master_pca3)[1] <- "Track2"


Plot the PCA information:

pcaCharts <- function(x) {
    x.var <- x$sdev^2
    x.pvar <- x.var/sum(x.var)
    print("proportions of variance:")
    print(x.pvar)

    par(mfrow = c(2, 2))
    plot(x.pvar, xlab = "Principal component", ylab = "Proportion of variance explained", ylim = c(0,
        1), type = "b")
    plot(cumsum(x.pvar), xlab = "Principal component", ylab = "Cumulative Proportion of variance explained",
        ylim = c(0, 1), type = "b")
    screeplot(x, main = "Absolute Variance", xlab = "Principal Component")
    screeplot(x, type = "l", main = "Absolute Variance")
    par(mfrow = c(1, 1))
}

pcaCharts(master_pca)
## [1] "proportions of variance:"
## [1] 0.5643365414 0.1929346874 0.1787666118 0.0631326010 0.0008295584

📊 Modules

Once we have normalized, adjusted and prepared all the data for analysis, it is time to proceed to the analysis modules.

The BEHAV3D Tumor Profiler is divided in three distinct modules:

1. Heterogeneity Module - Implements multiparametric single-cell time-series classification, allowing us to identify distinct single-cell behavioral patterns

Optional modules:

2. Large-scale phenotyping module - Performs large-scale TME phenotyping and identifies regions with a specific cellular composition and architecture within the TME of intravitally imaged tumors

3. Small-scale phenotyping module - Further refines TME phenotyping to better understand tumor cell behavior

The optional modules are already including the necessary sections for module 1 to allow for an easy run.

Modules are collapsed to facilitate navigation. Please unscroll the modules you would like to check

1. Heterogeneity Module

Click to expand the Heterogeneity module

Dynamic Time Warping

Dynamic Time Warping (DTW) is an algorithm used to measure similarity between two time series by aligning them in a way that minimizes the distance between corresponding points. It is particularly useful for comparing time series that may vary in speed or timing, like on this case.

If the matrix_distmat.rds file exists, it will skip the DTW analysis time, as it has been run previously. Else, it will run normally and generate the file at the end.

### MULTIVARIATE
list_master_pca <- split(master_pca3[, -c(1)], master_pca3$Track2)  ## split into list

if (!file.exists("matrix_distmat.rds")) {
    ## parallel working load parallel
    library(parallel)
    # create multi-process workers
    workers <- makeCluster(detectCores() - 2)
    # load dtwclust in each one, and make them use 1 thread per worker
    invisible(clusterEvalQ(workers, {
        library(dtwclust)
        RcppParallel::setThreadOptions(1L)
    }))
    # register your workers, e.g. with doParallel
    require(doParallel)
    registerDoParallel(workers)

    ### MULTIVARIATE
    set.seed(123)
    distmat <- proxy::dist(list_master_pca, method = "dtw")
    saveRDS(distmat, file = "distmat.rds")

    matrix_distmat <- as.matrix(distmat)
    saveRDS(matrix_distmat, file = "matrix_distmat.rds")


} else {
    matrix_distmat <- readRDS("matrix_distmat.rds")
}


Track2 <- as.numeric(names(list_master_pca))

UMAP and clustering

Now, we can perform the UMAP representation and cluster in the desired number of clusters. Any of the parameters can be modified to your convenience

# Adjust parameters according to the experiment characteristics
umap_dist <- umap(matrix_distmat, n_components = 2, input = "dist", init = "random", n_neighbors = 7,
    min_dist = 0.5, spread = 6, n_epochs = 1000, local_connectivity = 1)

umap_1 <- as.data.frame(umap_dist$layout)
# scatter3Drgl(umap_1$V1, umap_1$V2, umap_1$V3)


Plot of the raw UMAP

plot(umap_dist$`layout`, # Plot the result
     col=rgb(0,0,0,alpha=0.1), 
     pch=19,
     asp=0.4, xlab="UMAP 1", ylab="UMAP 2",
     main="Raw UMAP")


Here, we determine the number of clusters for the analysis. (n_cluster)

umap_1 <- as.data.frame(umap_dist$layout)
Track2_umap <- cbind(Track2, umap_1)
positiontype <- master_norm[, c("Track2", "position", "class", "mouse")]
positiontype <- positiontype[!duplicated(positiontype$Track2), ]
umap_2 <- left_join(Track2_umap, positiontype)

# Clustering
n_cluster = 7
hc <- hclust(dist(umap_dist$layout, method = "euclidean"), method = "ward.D2")
hierarchical_clusters <- cutree(hc, k = n_cluster)
umap_3 <- cbind(hierarchical_clusters, umap_2)

colnames(umap_3)[1] <- "cluster2"


Here. we can obtain different UMAP representations, providing our color palette

## plot

# Make a color palette that adjusts to the cluster direction Generate a continuous palette
# with 100 colors
continuous_palette <- colorRampPalette(c("cyan4", "darkturquoise", "darkorchid4", "darkorchid1",
    "deeppink4", "deeppink1", "goldenrod3", "gold"))(50)

# Select 8 colors from the continuous palette
mypalette_1 <- continuous_palette[seq(1, length(continuous_palette), length.out = n_cluster)]

ggplot(umap_3, aes(x = V1, y = V2, color = as.factor(cluster2))) + geom_point(size = 2, alpha = 0.8) +
    labs(color = "cluster") + xlab("") + ylab("") + ggtitle("umap Cluster") + scale_color_manual(values = mypalette_1) +
    theme_light(base_size = 20) + theme(axis.text.x = element_blank(), axis.text.y = element_blank()) +
    theme(aspect.ratio = 1)


UMAP positions colored by large-scale regions

ggplot(umap_3, aes(x = V1, y = V2, color = as.factor(class))) + geom_point(size = 0.5, alpha = 0.8) +
    labs(color = "Environment_cluster") + xlab("") + ylab("") + ggtitle("umap Cluster") + scale_color_manual(values = c("cyan",
    "gold", "red")) + theme_light(base_size = 20) + theme(axis.text.x = element_blank(), axis.text.y = element_blank()) +
    theme(aspect.ratio = 1) + facet_wrap(mouse ~ position)


UMAP colored by large-scale regions

ggplot(umap_3, aes(x = V1, y = V2, color = as.factor(class))) + geom_point(size = 2, alpha = 0.6) +
    labs(color = "Tumor region phenotype") + xlab("") + ylab("") + ggtitle("distribution of large scale regions") +
    scale_color_manual(values = c("cyan", "gold", "red")) + theme_light(base_size = 20) + theme(axis.text.x = element_blank(),
    axis.text.y = element_blank()) + theme(aspect.ratio = 1)

### UMAP direction

Here, we incorporate the MG and SR101 information to project the direction information into the UMAP

### calculate average stats per track
master_class <- left_join(master_cor2, umap_3[c(1:4)], by = c(Track2 = "Track2"))
master_class <- master_class %>%
    filter(cluster2 != 0)
master_class$cluster2 <- as.numeric(master_class$cluster2)
master_class_sum <- master_class %>%
    group_by(position, class, mouse, Track2) %>%
    arrange(Time) %>%
    summarize(cluster2 = mean(cluster2), V1 = mean(V1), V2 = mean(V2), dist_tumor = mean(dist_tumor),
        dist_3_neigh = mean(dist_3_neigh), dist_10_neigh = mean(dist_10_neigh), speed = mean(speed),
        disp2 = mean(disp2), disp_d = mean(disp_d), disp_l = mean(disp_l), movement = last(dist_tumor),
        distance_to_tumor = last(dist_tumor_1), raw_dist_tumor = last(dist_tumor), Time = first(Time)) %>%
    ungroup()


## Join the information on the SR101 and MG
master_class_sum <- left_join(master_class_sum, master_distance_MG[c("Track2", "n_MG", "min_MG")],
    by = c(Track2 = "Track2"))
master_class_sum <- left_join(master_class_sum, master_distance_SR101[c("Track2", "n_SR101", "min_SR101")],
    by = c(Track2 = "Track2"))
master_class_sum <- left_join(master_class_sum, BV_df_sum, by = c(Track2 = "Track2"))

mid <- 0
master_class_sum$movement2 <- ifelse(master_class_sum$movement > 0, "away from tumor", ifelse(master_class_sum$movement ==
    0, "no movement", "towards the tumor"))
master_class_sum <- master_class_sum[!is.na(master_class_sum$cluster2), ]

saveRDS(master_class_sum, "master_class_sum.rds")
write.csv(master_class_sum, "master_class_sum.csv")


And here is the resulting plot:

ggplot(master_class_sum, mapping = aes(x = V1, y = V2, color = movement)) + geom_point(size = 2,
    alpha = 0.8) + labs(color = "Direction") + xlab("") + ylab("") + ggtitle("Direction of movement relative to tumor core") +
    theme_light(base_size = 20) + theme(axis.text.x = element_blank(), axis.text.y = element_blank()) +
    theme(aspect.ratio = 1) + theme(aspect.ratio = 1) + scale_color_gradient2(midpoint = mid, low = "blue",
    mid = "grey80", high = "red3")

Heatmap

In this section, we project the relevant clustering features in a Heatmap

### Create heatmap
sum_all <- master_class_sum %>%
    group_by(cluster2) %>%
    summarise(displacement2 = median(disp2), speed = median(speed), displacement_delta = median(disp_d),
        displacement_length = median(disp_l), max_speed = quantile(speed, 0.75), persistance = (displacement_length/displacement_delta),
        nearest_10_cell = median(dist_10_neigh), dist_tumor_core = median(distance_to_tumor), n_SR101 = mean(n_SR101,
            na.rm = T), min_SR101 = mean(min_SR101, na.rm = T), n_MG = mean(n_MG, na.rm = T), min_MG = mean(min_MG,
            na.rm = T), min_BV_dist = mean(BV_min, na.rm = T), sd_BV_dist = mean(BV_sd, na.rm = T),
        mean_BV_dist = mean(BV_mean, na.rm = T), movement = median(movement))

Cluster_movement <- sum_all[, c(1, length(names(sum_all)))]
Cluster_movement <- Cluster_movement[order(Cluster_movement$movement), ]

stdize = function(x, ...) {
    (x - min(x, ...))/(max(x, ...) - min(x, ...))
}
sum_all[-c(1)] <- lapply(sum_all[-c(1)], stdize, na.rm = T)



df <- sum_all[, -c(1)]
df <- df[order(match(rownames(df), Cluster_movement$cluster2)), , drop = FALSE] %>%
    ungroup()
df <- as.data.frame(df)
rownames(df) <- Cluster_movement$cluster2

We show all the features extracted in the analysis:

heat_m <- pheatmap(df, clustering_method = "complete", cluster_cols = F, cluster_rows = F, cellwidth = 20,
    cellheight = 20, treeheight_row = 0, treeheight_col = 0, fontsize = 8, na_col = "grey50", main = "Features",
    angle_col = 90, color = colorRampPalette(c("deepskyblue1", "grey95", "deeppink2"))(50))

df_cl_features <- df[, c("displacement2", "speed", "max_speed", "displacement_delta", "displacement_length",
    "persistance", "movement")]
rownames(df_cl_features) <- rownames(df)


And a heatmap of the features used for clustering:

pheatmap(df_cl_features, clustering_method = "complete", cluster_cols = F, cluster_rows = F, cellwidth = 20,
    cellheight = 20, treeheight_row = 0, treeheight_col = 0, fontsize = 8, na_col = "grey50", main = "Features used for clustering",
    angle_col = 90, color = colorRampPalette(c("deepskyblue1", "grey95", "deeppink2"))(50))


We also print tha Cluster movement data (towards or away from the tumor):

print(Cluster_movement)  #3 per cluster we can see the direction
## # A tibble: 7 × 2
##   cluster2 movement
##      <dbl>    <dbl>
## 1        1   -2.77 
## 2        7   -0.586
## 3        4   -0.180
## 4        6    0.786
## 5        2    0.980
## 6        5    1.95 
## 7        3    8.60

Movement within UMAP Clusters

Plots the clusters according to direction and other relevant features

## Movement within the clusters in UMAP

## plot clusters giving them a color according to direction Convert df to a data frame Add
## cluster2 as a column
df1 <- as.data.frame(df)

df1 <- mutate(df1, cluster2 = row.names(df))

# Order df1 by movement and reorder cluster2 accordingly
df1 <- df1 %>%
    arrange(movement) %>%
    mutate(cluster2 = factor(cluster2, levels = unique(cluster2)))
## set first order clusters based on the heatmap:


UMAP representation with the behavioral clusters:

ggplot(master_class_sum, aes(x = V1, y = V2, color = as.factor(cluster2))) +  
  geom_point(size = 2, alpha = 1) + 
  labs(color = "Cluster") +
  xlab("") + 
  ylab("") +
  scale_color_manual(values = setNames(mypalette_1, levels(df1$cluster2))) +  # Match palette to cluster2 levels  ggtitle("UMAP Cluster") +
  theme_light(base_size = 20) +
  theme(axis.text.x = element_blank(),
        axis.text.y = element_blank(),
        aspect.ratio = 1)


UMAP representation with the behavioral clusters among mice:

ggplot(umap_3, aes(x = V1, y = V2, color = as.factor(cluster2))) + geom_point(size = 1, alpha = 0.8) +
    labs(color = "cluster") + xlab("") + ylab("") + ggtitle("umap Cluster among mice") + scale_color_manual(values = setNames(mypalette_1,
    levels(df1$cluster2))) + theme_light(base_size = 20) + theme(axis.text.x = element_blank(),
    axis.text.y = element_blank()) + theme(aspect.ratio = 1) + facet_wrap(mouse ~ .)


UMAP representation with the localization bistribution of each track:

ggplot(master_class_sum, mapping = aes(x = V1, y = V2, color = distance_to_tumor)) + geom_point(size = 2,
    alpha = 0.6) + xlab("") + ylab("") + ggtitle("UMAP localization distribution") + theme_light(base_size = 20) +
    theme(axis.text.x = element_blank(), axis.text.y = element_blank()) + theme(aspect.ratio = 1) +
    scale_color_gradientn(colours = rev(RColorBrewer::brewer.pal(4, "Spectral")))


UMAP representation with the raw movement of each track:

ggplot(master_class_sum, mapping = aes(x = V1, y = V2, color = movement)) + geom_point(size = 2,
    alpha = 0.6) + xlab("") + ylab("") + ggtitle("UMAP raw_movement") + theme_light(base_size = 20) +
    theme(axis.text.x = element_blank(), axis.text.y = element_blank()) + theme(aspect.ratio = 1) +
    scale_color_gradientn(colours = rev(RColorBrewer::brewer.pal(4, "Spectral")))


UMAP representation with the speed distribution of each track:

ggplot(master_class_sum, mapping = aes(x = V1, y = V2, color = speed)) + geom_point(size = 2, alpha = 0.6) +
    xlab("") + ylab("") + ggtitle("UMAP speed distribution") + theme_light(base_size = 20) + theme(axis.text.x = element_blank(),
    axis.text.y = element_blank()) + theme(aspect.ratio = 1) + scale_color_gradientn(colours = rev(RColorBrewer::brewer.pal(4,
    "Spectral")))


UMAP representation with the squared displacement disitribution of each track:

ggplot(master_class_sum, mapping = aes(x = V1, y = V2, color = disp2)) + geom_point(size = 2, alpha = 0.6) +
    xlab("") + ylab("") + ggtitle("UMAP disp2distribution") + theme_light(base_size = 20) + theme(axis.text.x = element_blank(),
    axis.text.y = element_blank()) + theme(aspect.ratio = 1) + scale_color_gradientn(colours = rev(RColorBrewer::brewer.pal(4,
    "Spectral")))


UMAP representation with the displacement length of each track:

ggplot(master_class_sum, mapping = aes(x = V1, y = V2, color = disp_l)) + geom_point(size = 2, alpha = 0.6) +
    xlab("") + ylab("") + ggtitle("UMAP disp_length") + theme_light(base_size = 20) + theme(axis.text.x = element_blank(),
    axis.text.y = element_blank()) + theme(aspect.ratio = 1) + scale_color_gradientn(colours = rev(RColorBrewer::brewer.pal(4,
    "Spectral")))

Cluster distibution

Provides a pie chart of the cluster distribution along the mice experiments

### plot enrichment on edge and core:
Number_cell_exp <- umap_3 %>%
    group_by(position, class, mouse) %>%
    summarise(total_cell = n())
Percentage_clus <- left_join(Number_cell_exp, umap_3)
Percentage_clus <- Percentage_clus %>%
    group_by(cluster2, position, class, mouse) %>%
    summarise(total_cell = mean(total_cell), num_cluster = n()) %>%
    mutate(percentage = num_cluster * 100/total_cell) %>%
    ungroup()
Percentage_clus_2 <- Percentage_clus %>%
    group_by(cluster2, position, class, mouse) %>%
    summarise(se_percentage = sd(percentage)/sqrt(length(percentage)), percentage = mean(percentage))
### Plot circular map
Percentage_clus_2$cluster2 <- as.factor(Percentage_clus_2$cluster2)
Percentage_clus_2$cluster2 <- factor(Percentage_clus_2$cluster2, levels = levels(df1$cluster2))


Behavioral cluster distribution in each mouse

Per2 <- ggplot(Percentage_clus_2, aes(fill = as.factor(cluster2), y = percentage, x = "")) + geom_bar(stat = "identity",
    position = "fill", width = 0.5) + coord_flip() + scale_y_reverse()
Per2 <- Per2 + facet_grid(. ~ mouse)
Per2 <- Per2 + theme_void() + scale_fill_manual(values = mypalette_1) + theme(strip.text.x = element_text(size = 8,
    angle = 90))
Per2


Mouse distribution in eacch behavioral cluster

Per4 <- ggplot(Percentage_clus_2, aes(fill = as.factor(mouse), y = percentage, x = "")) + geom_bar(stat = "identity",
    position = "fill", width = 0.5) + coord_flip() + scale_y_reverse()
Per4 <- Per4 + facet_grid(cluster2 ~ .)
Per4 <- Per4 + theme_void() + theme(strip.text.x = element_text(size = 8, angle = 90))
Per4

Differences based on clusters

This section analyzes the differences between clusters for different features and provides statistical support in each case.

Code section
### plot differences of the values per cluster
master_class_sum <- as.data.frame(master_class_sum)
master_class_sum$cluster2 <- as.character(master_class_sum$cluster2)
df1$mean_movement <- df1$movement
master_class_sum2 <- left_join(master_class_sum, df1[, c("cluster2", "mean_movement")])
master_class_sum2$cluster2 = factor(master_class_sum2$cluster2, levels = df1$cluster2)


#### try an anova between clusters based speed
p_speed <- ggplot(master_class_sum2, aes(x = as.factor(cluster2), y = speed, group = cluster2, fill = cluster2)) +
    geom_jitter(width = 0.2, alpha = 0.5) + geom_boxplot(outlier.colour = NA, alpha = 0.5) + ggtitle("speed per cluster") +
    theme_classic() + theme(aspect.ratio = 0.7) + scale_fill_manual(values = mypalette_1) + coord_cartesian(ylim = c(0,
    20))

a_speed <- aov(speed ~ cluster2, data = master_class_sum2)


#### try an anova between clusters based direction
p_direction <- ggplot(master_class_sum2, aes(x = as.factor(cluster2), y = movement, group = cluster2,
    fill = cluster2)) + geom_jitter(width = 0.2, alpha = 0.5) + geom_boxplot(outlier.colour = NA,
    alpha = 0.5) + ggtitle("speed per cluster") + theme_classic() + theme(aspect.ratio = 0.7) +
    scale_fill_manual(values = mypalette_1) + coord_cartesian(ylim = c(-15, 20))

a_direction <- aov(movement ~ cluster2, data = master_class_sum2)


#### try an anova between clusters based on distance to dist_3_neigh
p_dist3_neigth <- ggplot(master_class_sum2, aes(x = as.factor(cluster2), y = dist_3_neigh, group = cluster2,
    fill = cluster2)) + geom_jitter(width = 0.2, alpha = 0.5) + geom_boxplot(alpha = 0.5, outlier.colour = NA) +
    ggtitle("dist 3 neigh per cluster") + theme_classic() + theme(aspect.ratio = 0.7) + scale_fill_manual(values = mypalette_1) +
    coord_cartesian(ylim = c(10, 70))

a_dist_3_neigh <- aov(dist_3_neigh ~ cluster2, data = master_class_sum2)


#### try an anova between clusters based on distance to dist_10_neigh
p_dist_10_neigh <- ggplot(master_class_sum2, aes(x = as.factor(cluster2), y = dist_10_neigh, group = cluster2,
    fill = cluster2)) + geom_jitter(width = 0.2, alpha = 0.5) + geom_boxplot(outlier.colour = NA,
    alpha = 0.5) + ggtitle("dist 10 neigh per cluster") + theme_classic() + theme(aspect.ratio = 0.7) +
    scale_fill_manual(values = mypalette_1) + coord_cartesian(ylim = c(20, 90))

a_dist_10_neigh <- aov(dist_10_neigh ~ cluster2, data = master_class_sum2)


#### try an anova between clusters based on dist to MG
p_min_MG <- ggplot(subset(master_class_sum2, !is.na(min_MG)), aes(x = as.factor(cluster2), y = min_MG,
    group = cluster2, fill = cluster2)) + geom_jitter(width = 0.3, alpha = 0.5) + geom_boxplot(outlier.colour = NA,
    alpha = 0.5) + ggtitle("distance to closest MG") + theme_classic() + theme(aspect.ratio = 0.7) +
    scale_fill_manual(values = mypalette_1) + coord_cartesian(ylim = c(2, 45))

a_min_MG <- aov(min_MG ~ cluster2, data = subset(master_class_sum2, !is.na(min_MG)))


#### try an anova between clusters based on dist to SR101
p_min_SR101 <- ggplot(subset(master_class_sum2, !is.na(min_SR101)), aes(x = as.factor(cluster2),
    y = min_SR101, group = cluster2, fill = cluster2)) + geom_jitter(width = 0.3, alpha = 0.5) +
    geom_boxplot(outlier.colour = NA, alpha = 0.5) + ggtitle("distance to closest SR101") + theme_classic() +
    theme(aspect.ratio = 0.7) + scale_fill_manual(values = mypalette_1) + coord_cartesian(ylim = c(0,
    50))

a_min_SR101 <- aov(min_SR101 ~ cluster2, data = subset(master_class_sum2, !is.na(min_SR101)))


#### try an anova between clusters based on n of MG
p_n_MG <- ggplot(subset(master_class_sum2, !is.na(n_MG)), aes(x = as.factor(cluster2), y = n_MG,
    group = cluster2, fill = cluster2)) + geom_jitter(width = 0.3, alpha = 0.5) + geom_boxplot(outlier.colour = NA,
    alpha = 0.5) + ggtitle("n MG") + theme_classic() + theme(aspect.ratio = 0.7) + scale_fill_manual(values = mypalette_1) +
    coord_cartesian(ylim = c(0, 15))

a_n_MG <- aov(n_MG ~ cluster2, data = subset(master_class_sum2, !is.na(min_MG)))


#### try an anova between clusters based on n of SR101
p_nSR101 <- ggplot(subset(master_class_sum2, !is.na(n_SR101)), aes(x = as.factor(cluster2), y = n_SR101,
    group = cluster2, fill = cluster2)) + geom_jitter(width = 0.3, alpha = 0.5) + geom_boxplot(outlier.colour = NA,
    alpha = 0.5) + ggtitle("n SR101") + theme_classic() + theme(aspect.ratio = 0.7) + scale_fill_manual(values = mypalette_1) +
    coord_cartesian(ylim = c(0, 20))

a_nSR101 <- aov(n_SR101 ~ cluster2, data = subset(master_class_sum2, !is.na(n_SR101)))


#### try an anova between clusters based on mean distance to BV

p_BV_mean <- ggplot(subset(master_class_sum2, !is.na(BV_mean)), aes(x = as.factor(cluster2), y = BV_mean,
    group = cluster2, fill = cluster2)) + geom_jitter(width = 0.3, alpha = 0.5) + geom_boxplot(outlier.colour = NA,
    alpha = 0.5) + ggtitle("BV_mean") + theme_classic() + theme(aspect.ratio = 0.7) + scale_fill_manual(values = mypalette_1) +
    coord_cartesian(ylim = c(0, 20))

a_BV_mean <- aov(BV_mean ~ cluster2, data = subset(master_class_sum2, !is.na(BV_mean)))

#### try an anova between clusters based on min distance to BV

p_BV_min <- ggplot(subset(master_class_sum2, !is.na(BV_min)), aes(x = as.factor(cluster2), y = BV_min,
    group = cluster2, fill = cluster2)) + geom_jitter(width = 0.2, alpha = 0.5) + geom_boxplot(outlier.colour = NA,
    alpha = 0.5) + ggtitle("BV_min") + theme_classic() + theme(aspect.ratio = 0.7) + scale_fill_manual(values = mypalette_1) +
    coord_cartesian(ylim = c(0, 20))

a_BV_min <- aov(BV_min ~ cluster2, data = subset(master_class_sum2, !is.na(BV_min)))


#### try an anova between clusters based on contact to BV

p_BV_contact <- ggplot(subset(master_class_sum2, !is.na(BV_min)), aes(x = as.factor(cluster2), y = BV_contact,
    group = cluster2, fill = cluster2)) + geom_jitter(width = 0.3, alpha = 0.5) + geom_boxplot(outlier.colour = NA,
    alpha = 0.5) + ggtitle("BV_contact") + theme_classic() + scale_fill_manual(values = mypalette_1) +
    theme(aspect.ratio = 0.7)

a_BV_contact <- aov(BV_contact ~ cluster2, data = subset(master_class_sum2, !is.na(BV_contact)))


#### try an anova between clusters based on sd distance to BV

p_BV_sd <- ggplot(subset(master_class_sum2, !is.na(BV_min)), aes(x = as.factor(cluster2), y = BV_sd,
    group = cluster2, fill = cluster2)) + geom_jitter(width = 0.3, alpha = 0.5) + geom_boxplot(outlier.colour = NA,
    alpha = 0.5) + ggtitle("Standart deviation distance") + theme_classic() + theme(aspect.ratio = 0.7) +
    scale_fill_manual(values = mypalette_1) + coord_cartesian(ylim = c(0, 1.7))

a_BV_sd <- aov(BV_sd ~ cluster2, data = subset(master_class_sum2, !is.na(BV_min)))

See Statistical Information


Statistical information based on ANOVA and Tukey’s HSD (Honestly Significant Difference) test:

  • Speed
summary(a_speed)
##              Df Sum Sq Mean Sq F value Pr(>F)    
## cluster2      6   4396   732.6    86.4 <2e-16 ***
## Residuals   974   8259     8.5                   
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
TukeyHSD(a_speed)
##   Tukey multiple comparisons of means
##     95% family-wise confidence level
## 
## Fit: aov(formula = speed ~ cluster2, data = master_class_sum2)
## 
## $cluster2
##            diff        lwr        upr     p adj
## 7-1 -0.78537525 -1.7950989  0.2243484 0.2458651
## 4-1 -1.43519746 -2.2804303 -0.5899646 0.0000129
## 6-1 -1.36724890 -2.2314265 -0.5030713 0.0000687
## 2-1  5.28517554  4.1197006  6.4506505 0.0000000
## 5-1  1.11075630  0.2386033  1.9829093 0.0033633
## 3-1  5.31199878  4.0381500  6.5858476 0.0000000
## 4-7 -0.64982221 -1.7138560  0.4142116 0.5453741
## 6-7 -0.58187365 -1.6610178  0.4972705 0.6868864
## 2-7  6.07055080  4.7378539  7.4032477 0.0000000
## 5-7  1.89613156  0.8105901  2.9816730 0.0000062
## 3-7  6.09737403  4.6689343  7.5258138 0.0000000
## 6-4  0.06794856 -0.8591053  0.9950024 0.9999914
## 2-4  6.72037300  5.5075425  7.9332035 0.0000000
## 5-4  2.54595376  1.6114609  3.4804466 0.0000000
## 3-4  6.74719624  5.4298820  8.0645105 0.0000000
## 2-6  6.65242444  5.4263159  7.8785330 0.0000000
## 5-6  2.47800520  1.5263429  3.4296675 0.0000000
## 3-6  6.67924768  5.3496985  8.0087969 0.0000000
## 5-2 -4.17441924 -5.4061620 -2.9426765 0.0000000
## 3-2  0.02682324 -1.5156521  1.5692985 1.0000000
## 3-5  4.20124247  2.8664956  5.5359893 0.0000000
  • Direction
summary(a_direction)
##              Df Sum Sq Mean Sq F value Pr(>F)    
## cluster2      6   8010  1335.0   138.2 <2e-16 ***
## Residuals   974   9406     9.7                   
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
TukeyHSD(a_direction)
##   Tukey multiple comparisons of means
##     95% family-wise confidence level
## 
## Fit: aov(formula = movement ~ cluster2, data = master_class_sum2)
## 
## $cluster2
##           diff         lwr        upr     p adj
## 7-1  1.6666646  0.58913889  2.7441902 0.0001120
## 4-1  3.3088156  2.40682612  4.2108051 0.0000000
## 6-1  3.4597360  2.53752973  4.3819424 0.0000000
## 2-1  1.2692124  0.02547688  2.5129480 0.0419620
## 5-1  4.8981440  3.96742665  5.8288613 0.0000000
## 3-1 12.2319988 10.87261218 13.5913854 0.0000000
## 4-7  1.6421510  0.50666839  2.7776337 0.0004230
## 6-7  1.7930715  0.64146377  2.9446792 0.0000973
## 2-7 -0.3974521 -1.81963846  1.0247342 0.9823139
## 5-7  3.2314794  2.07304487  4.3899139 0.0000000
## 3-7 10.5653342  9.04097607 12.0896924 0.0000000
## 6-4  0.1509204 -0.83838426  1.1402251 0.9993683
## 2-4 -2.0396032 -3.33387417 -0.7453322 0.0000750
## 5-4  1.5893284  0.59208515  2.5865716 0.0000584
## 3-4  8.9231832  7.51741249 10.3289539 0.0000000
## 2-6 -2.1905236 -3.49896423 -0.8820830 0.0000184
## 5-6  1.4384079  0.42284234  2.4539735 0.0006171
## 3-6  8.7722627  7.35343553 10.1910900 0.0000000
## 5-2  3.6289315  2.31447839  4.9433847 0.0000000
## 3-2 10.9627864  9.31673524 12.6088375 0.0000000
## 3-5  7.3338548  5.90948096  8.7582287 0.0000000
  • Distance to 3 neighbors
summary(a_dist_3_neigh)
##              Df Sum Sq Mean Sq F value  Pr(>F)   
## cluster2      6   4102   683.6   3.107 0.00509 **
## Residuals   974 214307   220.0                   
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
TukeyHSD(a_dist_3_neigh)
##   Tukey multiple comparisons of means
##     95% family-wise confidence level
## 
## Fit: aov(formula = dist_3_neigh ~ cluster2, data = master_class_sum2)
## 
## $cluster2
##            diff         lwr       upr     p adj
## 7-1  0.01754239  -5.1258637  5.160949 1.0000000
## 4-1  2.82930265  -1.4762080  7.134813 0.4532049
## 6-1  2.85341888  -1.5485937  7.255431 0.4706551
## 2-1  7.57868723   1.6419034 13.515471 0.0032524
## 5-1  3.45271062  -0.9899280  7.895349 0.2467792
## 3-1  2.04498274  -4.4438440  8.533809 0.9675264
## 4-7  2.81176025  -2.6082948  8.231815 0.7249982
## 6-7  2.83587649  -2.6611491  8.332902 0.7302294
## 2-7  7.56114484   0.7725531 14.349737 0.0178448
## 5-7  3.43516823  -2.0944441  8.964781 0.5242611
## 3-7  2.02744035  -5.2488531  9.303734 0.9825831
## 6-4  0.02411623  -4.6981803  4.746413 1.0000000
## 2-4  4.74938459  -1.4286226 10.927392 0.2590727
## 5-4  0.62340798  -4.1367819  5.383598 0.9997383
## 3-4 -0.78431991  -7.4945540  5.925914 0.9998651
## 2-6  4.72526835  -1.5203754 10.970912 0.2775223
## 5-6  0.59929174  -4.2483572  5.446941 0.9998129
## 3-6 -0.80843614  -7.5809936  5.964121 0.9998475
## 5-2 -4.12597661 -10.4003203  2.148367 0.4523113
## 3-2 -5.53370449 -13.3908810  2.323472 0.3649053
## 3-5 -1.40772788  -8.2067615  5.391306 0.9964572
  • Distance to 10 neighbors
summary(a_dist_10_neigh)
##              Df Sum Sq Mean Sq F value Pr(>F)  
## cluster2      6   6625  1104.1   2.662 0.0144 *
## Residuals   974 403908   414.7                 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
TukeyHSD(a_dist_10_neigh)
##   Tukey multiple comparisons of means
##     95% family-wise confidence level
## 
## Fit: aov(formula = dist_10_neigh ~ cluster2, data = master_class_sum2)
## 
## $cluster2
##           diff         lwr       upr     p adj
## 7-1  0.5390424  -6.5220820  7.600167 0.9999891
## 4-1  2.6742951  -3.2365247  8.585115 0.8345260
## 6-1  4.3742336  -1.6690688 10.417536 0.3307890
## 2-1  8.8396869   0.6893739 16.990000 0.0235533
## 5-1  5.0215581  -1.0775177 11.120634 0.1861065
## 3-1  5.7174464  -3.1907385 14.625631 0.4832455
## 4-7  2.1352527  -5.3056692  9.576175 0.9797377
## 6-7  3.8351912  -3.7113996 11.381782 0.7440315
## 2-7  8.3006444  -1.0190729 17.620362 0.1177185
## 5-7  4.4825157  -3.1088119 12.073843 0.5860585
## 3-7  5.1784040  -4.8108546 15.167663 0.7256775
## 6-4  1.6999385  -4.7830657  8.182943 0.9873086
## 2-4  6.1653918  -2.3160846 14.646868 0.3255126
## 5-4  2.3472630  -4.1877630  8.882289 0.9391248
## 3-4  3.0431513  -6.1689927 12.255295 0.9591091
## 2-6  4.4654532  -4.1088780 13.039784 0.7212971
## 5-6  0.6473245  -6.0077698  7.302419 0.9999541
## 3-6  1.3432128  -7.9544919 10.640918 0.9995382
## 5-2 -3.8181288 -12.4318607  4.795603 0.8475663
## 3-2 -3.1222404 -13.9089642  7.664483 0.9788229
## 3-5  0.6958883  -8.6381641 10.029941 0.9999905
  • Minimal distance to an MG
summary(a_min_MG)
##              Df Sum Sq Mean Sq F value Pr(>F)  
## cluster2      6   2406     401   2.823 0.0102 *
## Residuals   586  83222     142                 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
TukeyHSD(a_min_MG)
##   Tukey multiple comparisons of means
##     95% family-wise confidence level
## 
## Fit: aov(formula = min_MG ~ cluster2, data = subset(master_class_sum2, !is.na(min_MG)))
## 
## $cluster2
##            diff        lwr        upr     p adj
## 7-1 -2.70009641  -7.897292  2.4970992 0.7222555
## 4-1 -2.19021452  -7.069143  2.6887144 0.8385902
## 6-1 -2.87138117  -7.498036  1.7552736 0.5240296
## 2-1 -6.51903163 -12.427182 -0.6108816 0.0197966
## 5-1 -2.78033015  -7.148591  1.5879312 0.4923842
## 3-1 -6.78499267 -13.201425 -0.3685600 0.0302985
## 4-7  0.50988190  -5.312614  6.3323778 0.9999750
## 6-7 -0.17128475  -5.784078  5.4415083 1.0000000
## 2-7 -3.81893522 -10.527418  2.8895480 0.6269088
## 5-7 -0.08023373  -5.482013  5.3215454 1.0000000
## 3-7 -4.08489625 -11.245072  3.0752797 0.6244649
## 6-4 -0.68116665  -6.000617  4.6382836 0.9997677
## 2-4 -4.32881711 -10.793866  2.1362314 0.4278908
## 5-4 -0.59011563  -5.686420  4.5061892 0.9998708
## 3-4 -4.59477815 -11.527398  2.3378422 0.4409182
## 2-6 -3.64765047  -9.924500  2.6291991 0.6033035
## 5-6  0.09105102  -4.764287  4.9463894 1.0000000
## 3-6 -3.91361150 -10.671068  2.8438445 0.6072444
## 5-2  3.73870148  -2.350191  9.8275940 0.5372617
## 3-2 -0.26596104  -7.957743  7.4258212 0.9999999
## 3-5 -4.00466252 -10.587898  2.5785726 0.5487295
  • Minimal distance to an SR101
summary(a_min_SR101)
##              Df Sum Sq Mean Sq F value Pr(>F)
## cluster2      6   1563   260.6   1.619   0.14
## Residuals   381  61311   160.9
TukeyHSD(a_min_SR101)
##   Tukey multiple comparisons of means
##     95% family-wise confidence level
## 
## Fit: aov(formula = min_SR101 ~ cluster2, data = subset(master_class_sum2, !is.na(min_SR101)))
## 
## $cluster2
##           diff         lwr       upr     p adj
## 7-1  1.8361891  -5.4820643  9.154442 0.9897048
## 4-1  4.6631659  -0.6949473 10.021279 0.1351299
## 6-1  4.0937335  -1.7782677  9.965735 0.3748810
## 2-1  5.2990323  -3.4047456 14.002810 0.5457186
## 5-1  4.7717052  -2.0412179 11.584628 0.3690700
## 3-1  2.2523708  -7.3825575 11.887299 0.9929490
## 4-7  2.8269768  -4.5012266 10.155180 0.9141573
## 6-7  2.2575444  -5.4543676  9.969456 0.9770703
## 2-7  3.4628433  -6.5745338 13.500220 0.9486322
## 5-7  2.9355161  -5.5148750 11.385907 0.9469299
## 3-7  0.4161817 -10.4385422 11.270906 0.9999998
## 6-4 -0.5694324  -6.4538297  5.314965 0.9999542
## 2-4  0.6358664  -8.0762793  9.348012 0.9999914
## 5-4  0.1085393  -6.7150708  6.932149 1.0000000
## 3-4 -2.4107951 -12.0532831  7.231693 0.9898966
## 2-6  1.2052988  -7.8319853 10.242583 0.9997011
## 5-6  0.6779717  -6.5561610  7.912104 0.9999621
## 3-6 -1.8413627 -11.7785956  8.095870 0.9980414
## 5-2 -0.5273272 -10.2024513  9.147797 0.9999985
## 3-2 -3.0466616 -14.8798331  8.786510 0.9881985
## 3-5 -2.5193344 -13.0399864  8.001318 0.9919743
  • Number of MG
summary(a_n_MG)
##              Df Sum Sq Mean Sq F value Pr(>F)
## cluster2      6     80   13.29   1.318  0.247
## Residuals   586   5908   10.08
TukeyHSD(a_n_MG)
##   Tukey multiple comparisons of means
##     95% family-wise confidence level
## 
## Fit: aov(formula = n_MG ~ cluster2, data = subset(master_class_sum2, !is.na(min_MG)))
## 
## $cluster2
##            diff        lwr      upr     p adj
## 7-1  0.15849325 -1.2262713 1.543258 0.9998793
## 4-1  0.28747795 -1.0124862 1.587442 0.9948575
## 6-1  0.34226190 -0.8904852 1.575009 0.9827356
## 2-1  0.47922999 -1.0949645 2.053425 0.9724161
## 5-1  0.24664225 -0.9172573 1.410542 0.9959296
## 3-1  1.58897243 -0.1206513 3.298596 0.0881023
## 4-7  0.12898471 -1.4223878 1.680357 0.9999816
## 6-7  0.18376866 -1.3117296 1.679267 0.9998172
## 2-7  0.32073674 -1.4667022 2.108176 0.9983899
## 5-7  0.08814900 -1.3511258 1.527424 0.9999970
## 3-7  1.43047918 -0.4773109 3.338269 0.2871225
## 6-4  0.05478395 -1.3625547 1.472123 0.9999998
## 2-4  0.19175204 -1.5308251 1.914329 0.9998974
## 5-4 -0.04083571 -1.3987185 1.317047 1.0000000
## 3-4  1.30149448 -0.5456646 3.148654 0.3631688
## 2-6  0.13696809 -1.5354644 1.809401 0.9999832
## 5-6 -0.09561966 -1.3892982 1.198059 0.9999909
## 3-6  1.24671053 -0.5537770 3.047198 0.3851439
## 5-2 -0.23258774 -1.8549401 1.389765 0.9995530
## 3-2  1.10974244 -0.9396912 3.159176 0.6811010
## 3-5  1.34233018 -0.4117371 3.096397 0.2633964
  • Number of SR101
summary(a_nSR101)
##              Df Sum Sq Mean Sq F value  Pr(>F)   
## cluster2      6    278   46.41    3.19 0.00456 **
## Residuals   381   5542   14.55                   
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
TukeyHSD(a_nSR101)
##   Tukey multiple comparisons of means
##     95% family-wise confidence level
## 
## Fit: aov(formula = n_SR101 ~ cluster2, data = subset(master_class_sum2, !is.na(n_SR101)))
## 
## $cluster2
##            diff       lwr        upr     p adj
## 7-1 -1.45959596 -3.659836  0.7406436 0.4379301
## 4-1 -1.93351886 -3.544440 -0.3225972 0.0076279
## 6-1 -1.54372294 -3.309145  0.2216996 0.1313095
## 2-1 -1.55862978 -4.175429  1.0581690 0.5722359
## 5-1 -2.42424242 -4.472554 -0.3759309 0.0090591
## 3-1 -0.57070707 -3.467457  2.3260429 0.9972406
## 4-7 -0.47392290 -2.677154  1.7293081 0.9955166
## 6-7 -0.08412698 -2.402720  2.2344664 0.9999999
## 2-7 -0.09903382 -3.116780  2.9187128 0.9999999
## 5-7 -0.96464646 -3.505264  1.5759713 0.9200414
## 3-7  0.88888889 -2.374594  4.1523715 0.9841590
## 6-4  0.38979592 -1.379353  2.1589453 0.9948872
## 2-4  0.37488909 -2.244425  2.9942037 0.9995503
## 5-4 -0.49072356 -2.542248  1.5608010 0.9920220
## 3-4  1.36281179 -1.536211  4.2618346 0.8052200
## 2-6 -0.01490683 -2.731975  2.7021609 1.0000000
## 5-6 -0.88051948 -3.055468  1.2944291 0.8939402
## 3-6  0.97301587 -2.014622  3.9606540 0.9610436
## 5-2 -0.86561265 -3.774448  2.0432223 0.9750862
## 3-2  0.98792271 -2.569731  4.5455765 0.9824934
## 3-5  1.85353535 -1.309508  5.0165790 0.5915556
  • Mean distance to BV
summary(a_BV_mean)
##              Df Sum Sq Mean Sq F value Pr(>F)
## cluster2      6     68   11.36   0.323  0.925
## Residuals   625  22004   35.21
TukeyHSD(a_BV_mean)
##   Tukey multiple comparisons of means
##     95% family-wise confidence level
## 
## Fit: aov(formula = BV_mean ~ cluster2, data = subset(master_class_sum2, !is.na(BV_mean)))
## 
## $cluster2
##            diff       lwr      upr     p adj
## 7-1 -0.26577225 -2.790540 2.258995 0.9999263
## 4-1 -0.13978253 -2.188181 1.908616 0.9999943
## 6-1 -0.10340353 -2.264545 2.057738 0.9999993
## 2-1 -0.08589028 -3.033728 2.861947 1.0000000
## 5-1  0.35652916 -2.010124 2.723182 0.9994065
## 3-1 -1.35613935 -4.836421 2.124142 0.9112516
## 4-7  0.12598973 -2.497241 2.749221 0.9999993
## 6-7  0.16236872 -2.549815 2.874552 0.9999974
## 2-7  0.17988197 -3.192825 3.552589 0.9999987
## 5-7  0.62230141 -2.256319 3.500922 0.9954657
## 3-7 -1.09036710 -4.937154 2.756419 0.9808229
## 6-4  0.03637899 -2.239016 2.311774 1.0000000
## 2-4  0.05389225 -2.978704 3.086488 1.0000000
## 5-4  0.49631168 -1.975113 2.967736 0.9969812
## 3-4 -1.21635682 -4.768715 2.336002 0.9510608
## 2-6  0.01751325 -3.092348 3.127374 1.0000000
## 5-6  0.45993269 -2.105713 3.025578 0.9984014
## 3-6 -1.25273581 -4.871278 2.365806 0.9484331
## 5-2  0.44241943 -2.813614 3.698452 0.9996728
## 3-2 -1.27024907 -5.407043 2.866545 0.9712444
## 3-5 -1.71266850 -5.457580 2.032243 0.8264509
  • Minimal distance to BV
summary(a_BV_min)
##              Df Sum Sq Mean Sq F value Pr(>F)  
## cluster2      6    312   51.96   1.826 0.0916 .
## Residuals   625  17782   28.45                 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
TukeyHSD(a_BV_min)
##   Tukey multiple comparisons of means
##     95% family-wise confidence level
## 
## Fit: aov(formula = BV_min ~ cluster2, data = subset(master_class_sum2, !is.na(BV_min)))
## 
## $cluster2
##            diff       lwr       upr     p adj
## 7-1 -0.18579179 -2.455449 2.0838654 0.9999833
## 4-1  0.75936189 -1.082060 2.6007836 0.8864390
## 6-1  0.55907650 -1.383697 2.5018496 0.9792776
## 2-1 -1.62580107 -4.275780 1.0241781 0.5385067
## 5-1  0.06627626 -2.061243 2.1937957 0.9999999
## 3-1 -1.65308922 -4.781712 1.4755340 0.7061094
## 4-7  0.94515369 -1.413018 3.3033255 0.8995563
## 6-7  0.74486829 -1.693268 3.1830044 0.9719640
## 2-7 -1.44000928 -4.471928 1.5919092 0.7992332
## 5-7  0.25206806 -2.335688 2.8398237 0.9999533
## 3-7 -1.46729742 -4.925393 1.9907980 0.8719077
## 6-4 -0.20028539 -2.245768 1.8451970 0.9999518
## 2-4 -2.38516296 -5.111336 0.3410103 0.1313725
## 5-4 -0.69308563 -2.914790 1.5286186 0.9688877
## 3-4 -2.41245111 -5.605868 0.7809662 0.2784772
## 2-6 -2.18487757 -4.980509 0.6107536 0.2398941
## 5-6 -0.49280023 -2.799205 1.8136045 0.9957478
## 3-6 -2.21216571 -5.465079 1.0407480 0.4082676
## 5-2  1.69207734 -1.234956 4.6191109 0.6096086
## 3-2 -0.02728814 -3.746088 3.6915114 1.0000000
## 3-5 -1.71936548 -5.085880 1.6471489 0.7384177
  • BV contact
summary(a_BV_contact)
##              Df Sum Sq Mean Sq F value Pr(>F)
## cluster2      6   0.85  0.1413   0.871  0.516
## Residuals   625 101.37  0.1622
TukeyHSD(a_BV_contact)
##   Tukey multiple comparisons of means
##     95% family-wise confidence level
## 
## Fit: aov(formula = BV_contact ~ cluster2, data = subset(master_class_sum2, !is.na(BV_contact)))
## 
## $cluster2
##              diff         lwr        upr     p adj
## 7-1 -0.0242750806 -0.19563914 0.14708898 0.9995836
## 4-1 -0.0229229275 -0.16195428 0.11610842 0.9990053
## 6-1 -0.0009914992 -0.14767510 0.14569210 1.0000000
## 2-1 -0.0178269590 -0.21790616 0.18225224 0.9999724
## 5-1  0.0761225829 -0.08450977 0.23675493 0.8009201
## 3-1 -0.0924266186 -0.32864449 0.14379125 0.9096130
## 4-7  0.0013521532 -0.17669496 0.17939926 1.0000000
## 6-7  0.0232835814 -0.16080100 0.20736817 0.9997841
## 2-7  0.0064481216 -0.22246833 0.23536457 1.0000000
## 5-7  0.1003976635 -0.09498352 0.29577885 0.7327874
## 3-7 -0.0681515380 -0.32924527 0.19294220 0.9875043
## 6-4  0.0219314282 -0.13250695 0.17636980 0.9995777
## 2-4  0.0050959684 -0.20073605 0.21092798 1.0000000
## 5-4  0.0990455103 -0.06869800 0.26678902 0.5848464
## 3-4 -0.0695036912 -0.31061366 0.17160627 0.9790956
## 2-6 -0.0168354598 -0.22791170 0.19424078 0.9999857
## 5-6  0.0771140821 -0.09702450 0.25125267 0.8473359
## 3-6 -0.0914351194 -0.33703719 0.15416695 0.9277082
## 5-2  0.0939495419 -0.12704786 0.31494695 0.8708864
## 3-2 -0.0745996596 -0.35537712 0.20617780 0.9862871
## 3-5 -0.1685492015 -0.42272837 0.08562997 0.4404848
  • BV standard deviation (sd)
summary(a_BV_sd)
##              Df Sum Sq Mean Sq F value   Pr(>F)    
## cluster2      6   7.80  1.3002   12.02 8.36e-13 ***
## Residuals   625  67.59  0.1081                     
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
TukeyHSD(a_BV_sd)
##   Tukey multiple comparisons of means
##     95% family-wise confidence level
## 
## Fit: aov(formula = BV_sd ~ cluster2, data = subset(master_class_sum2, !is.na(BV_min)))
## 
## $cluster2
##             diff         lwr         upr     p adj
## 7-1 -0.007171339 -0.14710581  0.13276313 0.9999990
## 4-1 -0.125418351 -0.23895020 -0.01188650 0.0195411
## 6-1 -0.138910342 -0.25869095 -0.01912973 0.0114031
## 2-1  0.264153311  0.10077030  0.42753632 0.0000443
## 5-1  0.056369400 -0.07480164  0.18754044 0.8649533
## 3-1  0.123263336 -0.06963021  0.31615689 0.4875780
## 4-7 -0.118247012 -0.26363880  0.02714478 0.1977911
## 6-7 -0.131739002 -0.28206095  0.01858294 0.1300277
## 2-7  0.271324650  0.08439338  0.45825592 0.0004068
## 5-7  0.063540739 -0.09600592  0.22308739 0.9023048
## 3-7  0.130434676 -0.08277230  0.34364165 0.5420015
## 6-4 -0.013491991 -0.13960508  0.11262110 0.9999189
## 2-4  0.389571662  0.22149095  0.55765238 0.0000000
## 5-4  0.181787751  0.04480979  0.31876571 0.0018504
## 3-4  0.248681687  0.05179329  0.44557008 0.0038276
## 2-6  0.403063652  0.23070055  0.57542676 0.0000000
## 5-6  0.195279742  0.05307962  0.33747986 0.0010715
## 3-6  0.262173678  0.06161707  0.46273029 0.0023283
## 5-2 -0.207783911 -0.38824856 -0.02731927 0.0123953
## 3-2 -0.140889974 -0.37017052  0.08839057 0.5365537
## 3-5  0.066893937 -0.14066666  0.27445453 0.9634729


The output in this section shows a boxplot for each feature in each behavioral cluster

  • Speed
p_speed

  • Direction
p_direction

  • Distance to 3 neighbors
p_dist3_neigth

  • Distance to 10 neighbors
p_dist_10_neigh

  • Minimal distance to an MG
p_min_MG

  • Minimal distance to a SR101
p_min_SR101

  • Number of MG
p_n_MG

  • Number of SR101
p_nSR101

  • Mean distance to BV
p_BV_mean

  • Minimal distance to BV
p_BV_min

  • BV contact
p_BV_contact

  • BV standard deviation (sd)
p_BV_sd

Backprojection

Select a position of interest to save .txt information about the specified position of interest in a .txt file.
This code section uses the master_distance.rds file generated in the File Import section

# Determine position of interest
position_interest <- "2430F11.CL3_1"

names_cell <- master_class[!duplicated(master_class$mouse), ]
names_cell$position
## [1] 2430F11.CL3_1 2430F13.CL2_2 2430F16.CL3_1 2430F17.CL3_2 2430F18.CL2_1
## [6] 2430F12.CL2_2
## 36 Levels: 2430F11.CL1_1 2430F12.CL1_1 2430F13.CL1_1 ... 2430F18.CL3_2
Tracks_1 <- master_class[which(master_class$position == position_interest), ]
Tracks_1 <- Tracks_1[!duplicated(Tracks_1$Track2), c(2, which(colnames(master_class) == "cluster2"))]
## Join with information on unique TrackID for that experiment

# Generated during the data import
master_distance <- readRDS("D:/BEHAV3D_Tumor_Profiler/results/master_distance.rds")
master_distance <- master_distance[c("TrackID", "Track2")] %>%
    distinct(TrackID, .keep_all = TRUE)
Tracks_1 <- left_join(Tracks_1, master_distance)
Tracks_1$Track2 <- NULL
Track_1_list <- split(Tracks_1, Tracks_1$cluster2)

write(paste(as.character(Track_1_list), sep = "' '", collapse = ", "), paste0(position_interest,
    ".txt"))

Then, we plot the cell trajectories combined with the behavioral cluster information:

### Plot cell trajectories:

## bind cluster information data to original dataset:
Tracks_1 <- master_class
### plot all the tracks
Tracks_1 <- Tracks_1[!duplicated(Tracks_1$Track2), c("cluster2", "Track2")]

Tracks_1$cluster2 <- as.factor(Tracks_1$cluster2)
master_traject <- left_join(master_cor2, Tracks_1)

master_traject <- na.omit(master_traject)
master_traject <- subset(master_traject, cluster2 != 0)
master_traject <- master_traject %>%
    group_by(position, mouse) %>%
    mutate(x_new = x - min(x), y_new = y - min(y))

master_traject$cluster2 = factor(master_traject$cluster2, levels = df1$cluster2)
  • Cell Trajectories in all positions in the dataset, colored by behavioral cluster:
g_all_track <- ggplot(master_traject) + geom_path(aes(x_new, y_new, group = Track2, color = as.factor(cluster2)),
    size = 0.2, arrow = arrow(ends = "last", type = "open", length = unit(0.01, "inches"))) + scale_color_manual(values = mypalette_1) +
    theme_bw() + theme(aspect.ratio = 1) + facet_wrap(class ~ position)
g_all_track

  • Cell trajectories in the position of interest, colored by behavioral cluster:
g_track_int1 <- ggplot(subset(master_traject, position %in% position_interest)) + geom_path(aes(x_new,
    y_new, group = Track2, color = as.factor(cluster2)), size = 0.5, arrow = arrow(ends = "last",
    type = "open", length = unit(0, "inches"))) + scale_color_manual(values = mypalette_1) + ggtitle(paste0(position_interest)) +
    theme_bw() + theme(aspect.ratio = 1)
g_track_int1

  • Raw cell trajectories in the position of interest
g_track_int2 <- ggplot(subset(master_traject, position %in% position_interest)) + geom_path(aes(x_new,
    y_new, group = Track2), size = 0.5, arrow = arrow(ends = "last", type = "open", length = unit(0,
    "inches"))) + scale_color_manual(values = mypalette_1) + ggtitle(paste0(position_interest)) +
    theme_bw() + theme(aspect.ratio = 1)
g_track_int2

1 + 2. Large-scale phenotyping Module

Click to expand the Large-scale phenotyping module

Dynamic Time Warping

Dynamic Time Warping (DTW) is an algorithm used to measure similarity between two time series by aligning them in a way that minimizes the distance between corresponding points. It is particularly useful for comparing time series that may vary in speed or timing, like on this case.

If the matrix_distmat.rds file exists, it will skip the DTW analysis time, as it has been run previously. Else, it will run normally and generate the file at the end.

### MULTIVARIATE
list_master_pca <- split(master_pca3[, -c(1)], master_pca3$Track2)  ## split into list

if (!file.exists("matrix_distmat.rds")) {
    ## parallel working load parallel
    library(parallel)
    # create multi-process workers
    workers <- makeCluster(detectCores() - 2)
    # load dtwclust in each one, and make them use 1 thread per worker
    invisible(clusterEvalQ(workers, {
        library(dtwclust)
        RcppParallel::setThreadOptions(1L)
    }))
    # register your workers, e.g. with doParallel
    require(doParallel)
    registerDoParallel(workers)

    ### MULTIVARIATE
    set.seed(123)
    distmat <- proxy::dist(list_master_pca, method = "dtw")
    saveRDS(distmat, file = "distmat.rds")

    matrix_distmat <- as.matrix(distmat)
    saveRDS(matrix_distmat, file = "matrix_distmat.rds")


} else {
    matrix_distmat <- readRDS("matrix_distmat.rds")
}


Track2 <- as.numeric(names(list_master_pca))

UMAP and clustering

Now, we can perform the UMAP representation and cluster in the desired number of clusters. Any of the parameters can be modified to your convenience

# Modify the parameters according to the characteristics of the experiment
umap_dist <- umap(matrix_distmat, n_components = 2, input = "dist", init = "random", n_neighbors = 7,
    min_dist = 0.5, spread = 6, n_epochs = 1000, local_connectivity = 1)

umap_1 <- as.data.frame(umap_dist$layout)

Here, we can see the Raw UMAP plot:

plot(umap_dist$`layout`, # Plot the result
     col=rgb(0,0,0,alpha=0.1), 
     pch=19,
     asp=0.4, xlab="UMAP 1", ylab="UMAP 2",
     main="Raw UMAP")


Here we determine the number of clusters for the analysis. (n_cluster)

umap_1 <- as.data.frame(umap_dist$layout)

Track2_umap <- cbind(Track2, umap_1)
positiontype <- master_norm[, c("Track2", "position", "class", "mouse")]
positiontype <- positiontype[!duplicated(positiontype$Track2), ]
umap_2 <- left_join(Track2_umap, positiontype)

# Clustering
n_cluster = 7
hc <- hclust(dist(umap_dist$layout, method = "euclidean"), method = "ward.D2")
hierarchical_clusters <- cutree(hc, k = n_cluster)
umap_3 <- cbind(hierarchical_clusters, umap_2)

colnames(umap_3)[1] <- "cluster2"


UMAP positions colored by large-scale regions

ggplot(umap_3, aes(x = V1, y = V2, color = as.factor(class))) + geom_point(size = 0.5, alpha = 0.8) +
    labs(color = "Environment_cluster") + xlab("") + ylab("") + ggtitle("umap Cluster") + scale_color_manual(values = c("cyan",
    "gold", "red")) + theme_light(base_size = 20) + theme(axis.text.x = element_blank(), axis.text.y = element_blank()) +
    theme(aspect.ratio = 1) + facet_wrap(mouse ~ position)


UMAP colored by large-scale regions

ggplot(umap_3, aes(x = V1, y = V2, color = as.factor(class))) + geom_point(size = 2, alpha = 0.6) +
    labs(color = "Tumor region phenotype") + xlab("") + ylab("") + ggtitle("distribution of large scale regions") +
    scale_color_manual(values = c("cyan", "gold", "red")) + theme_light(base_size = 20) + theme(axis.text.x = element_blank(),
    axis.text.y = element_blank()) + theme(aspect.ratio = 1)

UMAP direction

Here, we incorporate the MG and SR101 information to project the direction information into the UMAP
See deatils
### calculate average stats per track
master_class <- left_join(master_cor2, umap_3[c(1:4)], by = c(Track2 = "Track2"))
master_class <- master_class %>%
    filter(cluster2 != 0)
master_class$cluster2 <- as.numeric(master_class$cluster2)
master_class_sum <- master_class %>%
    group_by(position, class, mouse, Track2) %>%
    arrange(Time) %>%
    summarize(cluster2 = mean(cluster2), V1 = mean(V1), V2 = mean(V2), dist_tumor = mean(dist_tumor),
        dist_3_neigh = mean(dist_3_neigh), dist_10_neigh = mean(dist_10_neigh), speed = mean(speed),
        disp2 = mean(disp2), disp_d = mean(disp_d), disp_l = mean(disp_l), movement = last(dist_tumor),
        distance_to_tumor = last(dist_tumor_1), raw_dist_tumor = last(dist_tumor), Time = first(Time)) %>%
    ungroup()


## Join the information on the SR101 and MG
master_class_sum <- left_join(master_class_sum, master_distance_MG[c("Track2", "n_MG", "min_MG")],
    by = c(Track2 = "Track2"))
master_class_sum <- left_join(master_class_sum, master_distance_SR101[c("Track2", "n_SR101", "min_SR101")],
    by = c(Track2 = "Track2"))
master_class_sum <- left_join(master_class_sum, BV_df_sum, by = c(Track2 = "Track2"))

mid <- 0
master_class_sum$movement2 <- ifelse(master_class_sum$movement > 0, "away from tumor", ifelse(master_class_sum$movement ==
    0, "no movement", "towards the tumor"))
master_class_sum <- master_class_sum[!is.na(master_class_sum$cluster2), ]

saveRDS(master_class_sum, "master_class_sum.rds")
write.csv(master_class_sum, "master_class_sum.csv")

Cluster distibution

Provides a pie chart of the cluster distribution along the mice experiments

Show code
sum_all <- master_class_sum %>%
    group_by(cluster2) %>%
    summarise(displacement2 = median(disp2), speed = median(speed), displacement_delta = median(disp_d),
        displacement_length = median(disp_l), max_speed = quantile(speed, 0.75), persistance = (displacement_length/displacement_delta),
        nearest_10_cell = median(dist_10_neigh), dist_tumor_core = median(distance_to_tumor), n_SR101 = mean(n_SR101,
            na.rm = T), min_SR101 = mean(min_SR101, na.rm = T), n_MG = mean(n_MG, na.rm = T), min_MG = mean(min_MG,
            na.rm = T), min_BV_dist = mean(BV_min, na.rm = T), sd_BV_dist = mean(BV_sd, na.rm = T),
        mean_BV_dist = mean(BV_mean, na.rm = T), movement = median(movement))

Cluster_movement <- sum_all[, c(1, length(names(sum_all)))]
Cluster_movement <- Cluster_movement[order(Cluster_movement$movement), ]

stdize = function(x, ...) {
    (x - min(x, ...))/(max(x, ...) - min(x, ...))
}
sum_all[-c(1)] <- lapply(sum_all[-c(1)], stdize, na.rm = T)



df <- sum_all[, -c(1)]
df <- df[order(match(rownames(df), Cluster_movement$cluster2)), , drop = FALSE] %>%
    ungroup()
df <- as.data.frame(df)
rownames(df) <- Cluster_movement$cluster2

## plot clusters giving them a color according to direction Convert df to a data frame Add
## cluster2 as a column
df1 <- as.data.frame(df)

df1 <- mutate(df1, cluster2 = row.names(df))

# Order df1 by movement and reorder cluster2 accordingly
df1 <- df1 %>%
    arrange(movement) %>%
    mutate(cluster2 = factor(cluster2, levels = unique(cluster2)))

# Make a color palette that adjusts to the cluster direction Generate a continuous palette
# with 100 colors
continuous_palette <- colorRampPalette(c("cyan4", "darkturquoise", "darkorchid4", "darkorchid1",
    "deeppink4", "deeppink1", "goldenrod3", "gold"))(50)

# Select 8 colors from the continuous palette
mypalette_1 <- continuous_palette[seq(1, length(continuous_palette), length.out = n_cluster)]


### plot enrichment on edge and core:
Number_cell_exp <- umap_3 %>%
    group_by(position, class, mouse) %>%
    summarise(total_cell = n())
Percentage_clus <- left_join(Number_cell_exp, umap_3)
Percentage_clus <- Percentage_clus %>%
    group_by(cluster2, position, class, mouse) %>%
    summarise(total_cell = mean(total_cell), num_cluster = n()) %>%
    mutate(percentage = num_cluster * 100/total_cell) %>%
    ungroup()
Percentage_clus_2 <- Percentage_clus %>%
    group_by(cluster2, position, class, mouse) %>%
    summarise(se_percentage = sd(percentage)/sqrt(length(percentage)), percentage = mean(percentage))

Plots the pie chart of:

  • Behavioral cluster distribution in large-scale regions for every mouse experiment
### Plot circular map
Percentage_clus_2$cluster2 <- as.factor(Percentage_clus_2$cluster2)
Percentage_clus_2$cluster2 <- factor(Percentage_clus_2$cluster2, levels = levels(df1$cluster2))
Per1 <- ggplot(Percentage_clus_2, aes(fill = as.factor(cluster2), y = percentage, x = "")) + geom_bar(stat = "identity",
    position = "fill", width = 0.5) + coord_flip() + scale_y_reverse()
Per1 <- Per1 + facet_grid(class ~ mouse)
Per1 <- Per1 + theme_void() + scale_fill_manual(values = mypalette_1) + theme(strip.text.x = element_text(size = 8,
    angle = 90))
Per1


- Behavioral cluster distribution in large-scale regions

Per3 <- ggplot(Percentage_clus_2, aes(fill = as.factor(cluster2), y = percentage, x = "")) + geom_bar(stat = "identity",
    position = "fill", width = 0.5) + coord_flip() + scale_y_reverse()
Per3 <- Per3 + facet_grid(class ~ .)
Per3 <- Per3 + theme_void() + theme(strip.text.x = element_text(size = 8, angle = 90)) + scale_fill_manual(values = mypalette_1)
Per3

Large-scale regions’ statistics

This section provides boxplot information about features in different large-scale regions
Show code
# Additional Statistics

# Create an empty dataframe to store results
results_df <- data.frame(cluster2 = character(), p_value = numeric(), contrast = character(), pairwise_p_values = numeric())

# Create an empty list to store plots
plots_list <- list()

# Iterate over unique clusters
for (cluster in unique(Percentage_clus_2$cluster2)) {
    # Subset data for the current cluster
    cluster_data <- subset(Percentage_clus_2, cluster2 == cluster)

    # Calculate mean percentage for each mouse and subtract from original percentage
    normalized_data <- cluster_data %>%
        group_by(mouse) %>%
        mutate(normalized_percentage = (percentage - mean(percentage))/sd(percentage)) %>%
        ungroup()

    # Fit model with normalized percentage values
    model <- lm(normalized_percentage ~ class, data = normalized_data)

    # Extract estimates and p-value from model summary
    model_summary <- summary(model)
    p_value <- anova(model)$`Pr(>F)`[1]
    # Conduct pairwise comparisons using emmeans
    pairwise_comp <- emmeans(model, pairwise ~ class, adjust = "tukey")

    # Extract pairwise comparisons and p-values
    pairwise_p_values <- summary(pairwise_comp)[["contrasts"]][["p.value"]]
    contrast <- summary(pairwise_comp)[["contrasts"]][["contrast"]]
    # Append cluster name, estimates, and p-value to results dataframe
    results_df <- rbind(results_df, data.frame(cluster2 = cluster, p_value = p_value, contrast = contrast,
        pairwise_p_values = pairwise_p_values))
    # Create boxplot of normalized percentages by class
    plot <- ggplot(normalized_data, aes(x = class, y = normalized_percentage)) + geom_boxplot(width = 0.5) +
        geom_jitter(width = 0.3) + labs(title = paste("Cluster:", cluster)) + theme_bw() + theme(aspect.ratio = 1)
    # Store plot in the list
    plots_list[[cluster]] <- plot
}

Display the results dataframe:

# Display the results dataframe
print(results_df)
##    cluster2   p_value  contrast pairwise_p_values
## 1         1 0.1986608 CL1 - CL2        0.65206170
## 2         1 0.1986608 CL1 - CL3        0.58570646
## 3         1 0.1986608 CL2 - CL3        0.17343672
## 4         2 0.2429224 CL1 - CL2        0.22600442
## 5         2 0.2429224 CL1 - CL3        0.83603149
## 6         2 0.2429224 CL2 - CL3        0.49465662
## 7         3 0.2900285 CL1 - CL2        0.83026658
## 8         3 0.2900285 CL1 - CL3        0.26803502
## 9         3 0.2900285 CL2 - CL3        0.56701266
## 10        4 0.5987530 CL1 - CL2        0.92434864
## 11        4 0.5987530 CL1 - CL3        0.80043548
## 12        4 0.5987530 CL2 - CL3        0.57652596
## 13        5 0.2732207 CL1 - CL2        0.33922788
## 14        5 0.2732207 CL1 - CL3        0.34641925
## 15        5 0.2732207 CL2 - CL3        0.99741741
## 16        6 0.4322490 CL1 - CL2        0.50574638
## 17        6 0.4322490 CL1 - CL3        0.48947933
## 18        6 0.4322490 CL2 - CL3        0.99955656
## 19        7 0.0163081 CL1 - CL2        0.31132492
## 20        7 0.0163081 CL1 - CL3        0.01420838
## 21        7 0.0163081 CL2 - CL3        0.10806546


Plot the statistics in large-scale regions (environmental clusters)

plots_list
## $`1`

## 
## $`2`

## 
## $`3`

## 
## $`4`

## 
## $`5`

## 
## $`6`

## 
## $`7`

Differences based on large-scale regions

This section analyzes the differences between large-scale regions for different features and provides statistical support in each case.

Code section
#### Differences based on environmental clusters

### plot differences of the values per cluster
master_class_sum <- as.data.frame(master_class_sum)
master_class_sum$cluster2 <- as.character(master_class_sum$cluster2)
df1$mean_movement <- df1$movement
master_class_sum2 <- left_join(master_class_sum, df1[, c("cluster2", "mean_movement")])
master_class_sum2$cluster2 = factor(master_class_sum2$cluster2, levels = df1$cluster2)


#### try an anova between clusters based on distance to speed
p_class_speed <- ggplot(master_class_sum2, aes(x = as.factor(class), y = speed, group = class, fill = class)) +
    geom_violin(alpha = 1, outlier.colour = NA) + stat_summary(fun = median, geom = "crossbar",
    size = 1, color = "grey25") + scale_fill_manual(values = c("cyan", "gold", "red")) + ggtitle("speed per env cluster") +
    theme_classic() + theme(aspect.ratio = 1) + coord_cartesian(ylim = c(0, 20))

a_class_speed <- aov(speed ~ class, data = master_class_sum2)


#### try an anova between clusters based on squared displacement
p_class_disp2 <- ggplot(master_class_sum2, aes(x = as.factor(class), y = disp2, group = class, fill = class)) +
    geom_violin(alpha = 1, outlier.colour = NA) + stat_summary(fun = median, geom = "crossbar",
    size = 1, color = "grey25") + scale_fill_manual(values = c("cyan", "gold", "red")) + ggtitle("speed per env cluster") +
    theme_classic() + theme(aspect.ratio = 1) + coord_cartesian(ylim = c(0, 180))

a_class_disp2 <- aov(disp2 ~ class, data = master_class_sum2)


#### try an anova between clusters based on raw tumor movement
p_class_movement <- ggplot(master_class_sum2, aes(x = as.factor(class), y = movement, group = class,
    fill = class)) + geom_violin(alpha = 1, outlier.colour = NA) + stat_summary(fun = median, geom = "crossbar",
    size = 1, color = "grey25") + scale_fill_manual(values = c("cyan", "gold", "red")) + ggtitle("movement direction per env cluster") +
    theme_classic() + theme(aspect.ratio = 1) + scale_color_gradientn(colours = rev(RColorBrewer::brewer.pal(4,
    "Spectral"))) + coord_cartesian(ylim = c(-10, 17))

a_class_movement <- aov(movement ~ class, data = master_class_sum2)
See Statistical Information

Statistical information between large-scale regions (environmental clusters) based on ANOVA and Tukey’s HSD (Honestly Significant Difference) test:

  • Speed
summary(a_class_speed)
##              Df Sum Sq Mean Sq F value  Pr(>F)   
## class         2    167   83.32   6.525 0.00153 **
## Residuals   978  12488   12.77                   
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
TukeyHSD(a_class_speed)
##   Tukey multiple comparisons of means
##     95% family-wise confidence level
## 
## Fit: aov(formula = speed ~ class, data = master_class_sum2)
## 
## $class
##              diff        lwr       upr     p adj
## CL2-CL1 0.1001108 -0.5744387 0.7746603 0.9353020
## CL3-CL1 0.9064598  0.2405857 1.5723339 0.0041126
## CL3-CL2 0.8063490  0.1702051 1.4424928 0.0084220
  • Squared displacement
summary(a_class_disp2)
##              Df   Sum Sq Mean Sq F value Pr(>F)
## class         2     2820    1410   0.064  0.938
## Residuals   978 21460215   21943
TukeyHSD(a_class_disp2)
##   Tukey multiple comparisons of means
##     95% family-wise confidence level
## 
## Fit: aov(formula = disp2 ~ class, data = master_class_sum2)
## 
## $class
##              diff       lwr      upr     p adj
## CL2-CL1  4.144633 -23.81786 32.10713 0.9354622
## CL3-CL1  3.097067 -24.50580 30.69993 0.9624874
## CL3-CL2 -1.047566 -27.41801 25.32288 0.9952179
  • Movement
summary(a_class_movement)
##              Df Sum Sq Mean Sq F value Pr(>F)
## class         2     23   11.67   0.656  0.519
## Residuals   978  17392   17.78
TukeyHSD(a_class_movement)
##   Tukey multiple comparisons of means
##     95% family-wise confidence level
## 
## Fit: aov(formula = movement ~ class, data = master_class_sum2)
## 
## $class
##              diff        lwr       upr     p adj
## CL2-CL1 0.1138014 -0.6822461 0.9098489 0.9398244
## CL3-CL1 0.3680570 -0.4177525 1.1538665 0.5146473
## CL3-CL2 0.2542556 -0.4964687 1.0049800 0.7062059

The output in this section shows a boxplot for each feature in each behavioral cluster according to large-scale regions:

  • Speed
p_class_speed

  • Square displacement
p_class_disp2

  • Movement
p_class_movement


1 + 3. Small-scale phenotyping Module

Click to expand the Small-scale phenotyping module

Dynamic Time Warping

Dynamic Time Warping (DTW) is an algorithm used to measure similarity between two time series by aligning them in a way that minimizes the distance between corresponding points. It is particularly useful for comparing time series that may vary in speed or timing, like on this case.

If the matrix_distmat.rds file exists, it will skip the DTW analysis time, as it has been run previously. Else, it will run normally and generate the file at the end.

### MULTIVARIATE
list_master_pca <- split(master_pca3[, -c(1)], master_pca3$Track2)  ## split into list

if (!file.exists("matrix_distmat.rds")) {
    ## parallel working load parallel
    library(parallel)
    # create multi-process workers
    workers <- makeCluster(detectCores() - 2)
    # load dtwclust in each one, and make them use 1 thread per worker
    invisible(clusterEvalQ(workers, {
        library(dtwclust)
        RcppParallel::setThreadOptions(1L)
    }))
    # register your workers, e.g. with doParallel
    require(doParallel)
    registerDoParallel(workers)

    ### MULTIVARIATE
    set.seed(123)
    distmat <- proxy::dist(list_master_pca, method = "dtw")
    saveRDS(distmat, file = "distmat.rds")

    matrix_distmat <- as.matrix(distmat)
    saveRDS(matrix_distmat, file = "matrix_distmat.rds")


} else {
    matrix_distmat <- readRDS("matrix_distmat.rds")
}


Track2 <- as.numeric(names(list_master_pca))

UMAP and clustering

Now, we can perform the UMAP representation and cluster in the desired number of clusters. Any of the parameters can be modified to your convenience.

However, in this module, the output of the UMAP is not relevant, as it does not reflect any small-scale phenotype.

Show code


# Adjust parameters according to the characteristics of the experiment
umap_dist <- umap(matrix_distmat, n_components = 2, input = "dist", init = "random", n_neighbors = 7,
    min_dist = 0.5, spread = 6, n_epochs = 1000, local_connectivity = 1)

umap_1 <- as.data.frame(umap_dist$layout)

Track2_umap <- cbind(Track2, umap_1)
positiontype <- master_norm[, c("Track2", "position", "class", "mouse")]
positiontype <- positiontype[!duplicated(positiontype$Track2), ]
umap_2 <- left_join(Track2_umap, positiontype)

# Clustering
set.seed(123)
n_cluster = 7
hc <- hclust(dist(umap_dist$layout, method = "euclidean"), method = "ward.D2")
hierarchical_clusters <- cutree(hc, k = n_cluster)
umap_3 <- cbind(hierarchical_clusters, umap_2)
colnames(umap_3)[1] <- "cluster2"


### calculate average stats per track
master_class <- left_join(master_cor2, umap_3[c(1:4)], by = c(Track2 = "Track2"))
master_class <- master_class %>%
    filter(cluster2 != 0)
master_class$cluster2 <- as.numeric(master_class$cluster2)
master_class_sum <- master_class %>%
    group_by(position, class, mouse, Track2) %>%
    arrange(Time) %>%
    summarize(cluster2 = mean(cluster2), V1 = mean(V1), V2 = mean(V2), dist_tumor = mean(dist_tumor),
        dist_3_neigh = mean(dist_3_neigh), dist_10_neigh = mean(dist_10_neigh), speed = mean(speed),
        disp2 = mean(disp2), disp_d = mean(disp_d), disp_l = mean(disp_l), movement = last(dist_tumor),
        distance_to_tumor = last(dist_tumor_1), raw_dist_tumor = last(dist_tumor), Time = first(Time)) %>%
    ungroup()


## Join the information on the SR101 and MG
master_class_sum <- left_join(master_class_sum, master_distance_MG[c("Track2", "n_MG", "min_MG")],
    by = c(Track2 = "Track2"))
master_class_sum <- left_join(master_class_sum, master_distance_SR101[c("Track2", "n_SR101", "min_SR101")],
    by = c(Track2 = "Track2"))
master_class_sum <- left_join(master_class_sum, BV_df_sum, by = c(Track2 = "Track2"))

mid <- 0
master_class_sum$movement2 <- ifelse(master_class_sum$movement > 0, "away from tumor", ifelse(master_class_sum$movement ==
    0, "no movement", "towards the tumor"))
master_class_sum <- master_class_sum[!is.na(master_class_sum$cluster2), ]

saveRDS(master_class_sum, "master_class_sum.rds")
write.csv(master_class_sum, "master_class_sum.csv")

Heatmap

In this section, we project the relevant small-scale features in a Heatmap

### Create heatmap
sum_all <- master_class_sum %>%
    group_by(cluster2) %>%
    summarise(displacement2 = median(disp2), speed = median(speed), displacement_delta = median(disp_d),
        displacement_length = median(disp_l), max_speed = quantile(speed, 0.75), persistance = (displacement_length/displacement_delta),
        nearest_10_cell = median(dist_10_neigh), dist_tumor_core = median(distance_to_tumor), n_SR101 = mean(n_SR101,
            na.rm = T), min_SR101 = mean(min_SR101, na.rm = T), n_MG = mean(n_MG, na.rm = T), min_MG = mean(min_MG,
            na.rm = T), min_BV_dist = mean(BV_min, na.rm = T), sd_BV_dist = mean(BV_sd, na.rm = T),
        mean_BV_dist = mean(BV_mean, na.rm = T), movement = median(movement))

Cluster_movement <- sum_all[, c(1, length(names(sum_all)))]
Cluster_movement <- Cluster_movement[order(Cluster_movement$movement), ]

stdize = function(x, ...) {
    (x - min(x, ...))/(max(x, ...) - min(x, ...))
}
sum_all[-c(1)] <- lapply(sum_all[-c(1)], stdize, na.rm = T)

df <- sum_all[, -c(1)]
df <- df[order(match(rownames(df), Cluster_movement$cluster2)), , drop = FALSE] %>%
    ungroup()
df <- as.data.frame(df)
rownames(df) <- Cluster_movement$cluster2


Heatmap of all the included features:

heat_m <- pheatmap(df, clustering_method = "complete", cluster_cols = F, cluster_rows = F, cellwidth = 20,
    cellheight = 20, treeheight_row = 0, treeheight_col = 0, fontsize = 8, na_col = "grey50", main = " All features",
    angle_col = 90, color = colorRampPalette(c("deepskyblue1", "grey95", "deeppink2"))(50))

df_other_features <- df[, !colnames(df) %in% c("displacement2", "speed", "max_speed", "displacement_delta",
    "displacement_length", "persistance", "movement")]
rownames(df_other_features) <- rownames(df)


Heatmap of the small-scale features:

pheatmap(df_other_features, clustering_method = "complete", cluster_cols = F, cluster_rows = F,
    cellwidth = 20, cellheight = 20, treeheight_row = 0, treeheight_col = 0, fontsize = 8, na_col = "grey50",
    main = "Small-scale TME features", angle_col = 90, color = colorRampPalette(c("deepskyblue1",
        "grey95", "deeppink2"))(50))

Movement within UMAP Clusters

Plots the behavioral clusters according to relevant small-scale features

## plot clusters giving them a color according to direction Convert df to a data frame Add
## cluster2 as a column
df1 <- as.data.frame(df)

df1 <- mutate(df1, cluster2 = row.names(df))

# Order df1 by movement and reorder cluster2 accordingly
df1 <- df1 %>%
    arrange(movement) %>%
    mutate(cluster2 = factor(cluster2, levels = unique(cluster2)))
## set first order clusters based on the heatmap:
  • UMAP representation of the number of SR101 in 30um radius distribution:
ggplot(master_class_sum, mapping = aes(x = V1, y = V2, color = n_SR101)) + geom_point(size = 2,
    alpha = 0.6) + xlab("") + ylab("") + ggtitle("UMAP number of SR101 in 30um radius distribution") +
    theme_light(base_size = 20) + theme(axis.text.x = element_blank(), axis.text.y = element_blank()) +
    theme(aspect.ratio = 1) + scale_color_gradientn(colours = rev(RColorBrewer::brewer.pal(4, "Spectral")),
    na.value = "NA")

  • UMAP representation of the minimal distance to an SR101 in 30um radius distribution:
ggplot(master_class_sum, mapping = aes(x = V1, y = V2, color = min_SR101)) + geom_point(size = 2,
    alpha = 0.6) + xlab("") + ylab("") + ggtitle("UMAP min distance to SR101 in 30um radius distribution") +
    theme_light(base_size = 20) + theme(axis.text.x = element_blank(), axis.text.y = element_blank()) +
    theme(aspect.ratio = 1) + scale_color_gradientn(colours = RColorBrewer::brewer.pal(4, "Spectral"),
    na.value = "NA", trans = "pseudo_log")

  • UMAP representation of the minimal distance to an MG in 30um radius distribution:
ggplot(master_class_sum, mapping = aes(x = V1, y = V2, color = min_MG)) + geom_point(size = 2, alpha = 0.6) +
    xlab("") + ylab("") + ggtitle("UMAP min distance to MG in 30um radius distribution") + theme_light(base_size = 20) +
    theme(axis.text.x = element_blank(), axis.text.y = element_blank()) + theme(aspect.ratio = 1) +
    scale_color_gradientn(colours = RColorBrewer::brewer.pal(4, "Spectral"), na.value = "NA", trans = "pseudo_log")

  • UMAP representation of the number of MG in 30um radius distribution
ggplot(master_class_sum, mapping = aes(x = V1, y = V2, color = n_MG)) + geom_point(size = 2, alpha = 0.6) +
    xlab("") + ylab("") + ggtitle("UMAP number of MG in 30um radius distribution") + theme_light(base_size = 20) +
    theme(axis.text.x = element_blank(), axis.text.y = element_blank()) + theme(aspect.ratio = 1) +
    scale_color_gradientn(colours = rev(RColorBrewer::brewer.pal(4, "Spectral")), na.value = "NA")

  • UMAP representation of the mean distance to BV (Blood Vessels)
ggplot(master_class_sum, mapping = aes(x = V1, y = V2, color = BV_mean)) + geom_point(size = 2,
    alpha = 0.6) + xlab("") + ylab("") + ggtitle("UMAP mean distance to Blood vessels") + theme_light(base_size = 20) +
    theme(axis.text.x = element_blank(), axis.text.y = element_blank()) + theme(aspect.ratio = 1) +
    scale_color_gradientn(colours = rev(RColorBrewer::brewer.pal(4, "Spectral")), na.value = "NA",
        trans = "pseudo_log")

  • UMAP representation of the mean contact with BV
ggplot(master_class_sum, mapping = aes(x = V1, y = V2, color = BV_min)) + geom_point(size = 2, alpha = 0.6) +
    xlab("") + ylab("") + ggtitle("UMAP mean contact with Blood vessels") + theme_light(base_size = 20) +
    theme(axis.text.x = element_blank(), axis.text.y = element_blank()) + theme(aspect.ratio = 1) +
    scale_color_gradientn(colours = rev(RColorBrewer::brewer.pal(4, "Spectral")), na.value = "NA")

  • UMAP representation of the big neighbor distribution
ggplot(master_class_sum, mapping = aes(x = V1, y = V2, color = dist_10_neigh)) + geom_point(size = 2,
    alpha = 0.6) + xlab("") + ylab("") + ggtitle("UMAP big neighbours distribution") + theme_light(base_size = 20) +
    theme(axis.text.x = element_blank(), axis.text.y = element_blank()) + theme(aspect.ratio = 1) +
    scale_color_gradientn(colours = rev(RColorBrewer::brewer.pal(4, "Spectral")), trans = "pseudo_log")

  • UMAP representation of the close neighbor distribution
ggplot(master_class_sum, mapping = aes(x = V1, y = V2, color = dist_3_neigh)) + geom_point(size = 2,
    alpha = 0.6) + xlab("") + ylab("") + ggtitle("UMAP close neighbours distribution") + theme_light(base_size = 20) +
    theme(axis.text.x = element_blank(), axis.text.y = element_blank()) + theme(aspect.ratio = 1) +
    scale_color_gradientn(colours = rev(RColorBrewer::brewer.pal(4, "Spectral")), trans = "pseudo_log")

Differences based on clusters

This section analyzes the differences between clusters for small-scale features and provides statistical support in each case.

Code section
### plot differences of the values per cluster
master_class_sum <- as.data.frame(master_class_sum)
master_class_sum$cluster2 <- as.character(master_class_sum$cluster2)
df1$mean_movement <- df1$movement
master_class_sum2 <- left_join(master_class_sum, df1[, c("cluster2", "mean_movement")])
master_class_sum2$cluster2 = factor(master_class_sum2$cluster2, levels = df1$cluster2)


#### try an anova between clusters based on distance to dist_3_neigh
p_dist3_neigth <- ggplot(master_class_sum2, aes(x = as.factor(cluster2), y = dist_3_neigh, group = cluster2,
    fill = cluster2)) + geom_jitter(width = 0.2, alpha = 0.5) + geom_boxplot(alpha = 0.5, outlier.colour = NA) +
    ggtitle("dist 3 neigh per cluster") + theme_classic() + theme(aspect.ratio = 0.7) + scale_fill_manual(values = mypalette_1) +
    coord_cartesian(ylim = c(10, 70))

a_dist_3_neigh <- aov(dist_3_neigh ~ cluster2, data = master_class_sum2)


#### try an anova between clusters based on distance to dist_10_neigh
p_dist_10_neigh <- ggplot(master_class_sum2, aes(x = as.factor(cluster2), y = dist_10_neigh, group = cluster2,
    fill = cluster2)) + geom_jitter(width = 0.2, alpha = 0.5) + geom_boxplot(outlier.colour = NA,
    alpha = 0.5) + ggtitle("dist 10 neigh per cluster") + theme_classic() + theme(aspect.ratio = 0.7) +
    scale_fill_manual(values = mypalette_1) + coord_cartesian(ylim = c(20, 90))

a_dist_10_neigh <- aov(dist_10_neigh ~ cluster2, data = master_class_sum2)


#### try an anova between clusters based on dist to MG
p_min_MG <- ggplot(subset(master_class_sum2, !is.na(min_MG)), aes(x = as.factor(cluster2), y = min_MG,
    group = cluster2, fill = cluster2)) + geom_jitter(width = 0.3, alpha = 0.5) + geom_boxplot(outlier.colour = NA,
    alpha = 0.5) + ggtitle("distance to closest MG") + theme_classic() + theme(aspect.ratio = 0.7) +
    scale_fill_manual(values = mypalette_1) + coord_cartesian(ylim = c(2, 45))

a_min_MG <- aov(min_MG ~ cluster2, data = subset(master_class_sum2, !is.na(min_MG)))


#### try an anova between clusters based on dist to SR101
p_min_SR101 <- ggplot(subset(master_class_sum2, !is.na(min_SR101)), aes(x = as.factor(cluster2),
    y = min_SR101, group = cluster2, fill = cluster2)) + geom_jitter(width = 0.3, alpha = 0.5) +
    geom_boxplot(outlier.colour = NA, alpha = 0.5) + ggtitle("distance to closest SR101") + theme_classic() +
    theme(aspect.ratio = 0.7) + scale_fill_manual(values = mypalette_1) + coord_cartesian(ylim = c(0,
    50))

a_min_SR101 <- aov(min_SR101 ~ cluster2, data = subset(master_class_sum2, !is.na(min_SR101)))


#### try an anova between clusters based on n of MG
p_n_MG <- ggplot(subset(master_class_sum2, !is.na(n_MG)), aes(x = as.factor(cluster2), y = n_MG,
    group = cluster2, fill = cluster2)) + geom_jitter(width = 0.3, alpha = 0.5) + geom_boxplot(outlier.colour = NA,
    alpha = 0.5) + ggtitle("n MG") + theme_classic() + theme(aspect.ratio = 0.7) + scale_fill_manual(values = mypalette_1) +
    coord_cartesian(ylim = c(0, 15))

a_n_MG <- aov(n_MG ~ cluster2, data = subset(master_class_sum2, !is.na(min_MG)))


#### try an anova between clusters based on n of SR101
p_nSR101 <- ggplot(subset(master_class_sum2, !is.na(n_SR101)), aes(x = as.factor(cluster2), y = n_SR101,
    group = cluster2, fill = cluster2)) + geom_jitter(width = 0.3, alpha = 0.5) + geom_boxplot(outlier.colour = NA,
    alpha = 0.5) + ggtitle("n SR101") + theme_classic() + theme(aspect.ratio = 0.7) + scale_fill_manual(values = mypalette_1) +
    coord_cartesian(ylim = c(0, 20))

a_nSR101 <- aov(n_SR101 ~ cluster2, data = subset(master_class_sum2, !is.na(n_SR101)))


#### try an anova between clusters based on mean distance to BV

p_BV_mean <- ggplot(subset(master_class_sum2, !is.na(BV_mean)), aes(x = as.factor(cluster2), y = BV_mean,
    group = cluster2, fill = cluster2)) + geom_jitter(width = 0.3, alpha = 0.5) + geom_boxplot(outlier.colour = NA,
    alpha = 0.5) + ggtitle("BV_mean") + theme_classic() + theme(aspect.ratio = 0.7) + scale_fill_manual(values = mypalette_1) +
    coord_cartesian(ylim = c(0, 20))

a_BV_mean <- aov(BV_mean ~ cluster2, data = subset(master_class_sum2, !is.na(BV_mean)))

#### try an anova between clusters based on min distance to BV

p_BV_min <- ggplot(subset(master_class_sum2, !is.na(BV_min)), aes(x = as.factor(cluster2), y = BV_min,
    group = cluster2, fill = cluster2)) + geom_jitter(width = 0.2, alpha = 0.5) + geom_boxplot(outlier.colour = NA,
    alpha = 0.5) + ggtitle("BV_min") + theme_classic() + theme(aspect.ratio = 0.7) + scale_fill_manual(values = mypalette_1) +
    coord_cartesian(ylim = c(0, 20))

a_BV_min <- aov(BV_min ~ cluster2, data = subset(master_class_sum2, !is.na(BV_min)))


#### try an anova between clusters based on contact to BV

p_BV_contact <- ggplot(subset(master_class_sum2, !is.na(BV_min)), aes(x = as.factor(cluster2), y = BV_contact,
    group = cluster2, fill = cluster2)) + geom_jitter(width = 0.3, alpha = 0.5) + geom_boxplot(outlier.colour = NA,
    alpha = 0.5) + ggtitle("BV_contact") + theme_classic() + scale_fill_manual(values = mypalette_1) +
    theme(aspect.ratio = 0.7)

a_BV_contact <- aov(BV_contact ~ cluster2, data = subset(master_class_sum2, !is.na(BV_contact)))


#### try an anova between clusters based on sd distance to BV

p_BV_sd <- ggplot(subset(master_class_sum2, !is.na(BV_min)), aes(x = as.factor(cluster2), y = BV_sd,
    group = cluster2, fill = cluster2)) + geom_jitter(width = 0.3, alpha = 0.5) + geom_boxplot(outlier.colour = NA,
    alpha = 0.5) + ggtitle("Standart deviation distance") + theme_classic() + theme(aspect.ratio = 0.7) +
    scale_fill_manual(values = mypalette_1) + coord_cartesian(ylim = c(0, 1.7))

a_BV_sd <- aov(BV_sd ~ cluster2, data = subset(master_class_sum2, !is.na(BV_min)))

See Statistical Information


Statistical information based on ANOVA and Tukey’s HSD (Honestly Significant Difference) test:

  • Distance to 3 neighbors
summary(a_dist_3_neigh)
##              Df Sum Sq Mean Sq F value Pr(>F)  
## cluster2      6   3361   560.1   2.537 0.0192 *
## Residuals   974 215048   220.8                 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
TukeyHSD(a_dist_3_neigh)
##   Tukey multiple comparisons of means
##     95% family-wise confidence level
## 
## Fit: aov(formula = dist_3_neigh ~ cluster2, data = master_class_sum2)
## 
## $cluster2
##            diff         lwr        upr     p adj
## 7-5 -6.93325357 -13.4118742 -0.4546329 0.0269198
## 1-5 -2.52894319  -8.9130363  3.8551500 0.9051609
## 4-5 -1.21175380  -7.5533433  5.1298357 0.9977331
## 2-5 -1.57231465  -7.3540749  4.2094456 0.9846451
## 3-5 -2.18592905  -8.4433504  4.0714923 0.9465650
## 6-5 -1.54163514  -8.1300708  5.0468005 0.9930844
## 1-7  4.40431038  -1.1310478  9.9396686 0.2210967
## 4-7  5.72149977   0.2352167 11.2077829 0.0344673
## 2-7  5.36093892   0.5326580 10.1892199 0.0184533
## 3-7  4.74732452  -0.6414478 10.1360969 0.1261580
## 6-7  5.39161843  -0.3782194 11.1614563 0.0849160
## 4-1  1.31718939  -4.0571405  6.6915193 0.9911286
## 2-1  0.95662853  -3.7440540  5.6573111 0.9967798
## 3-1  0.34301413  -4.9317358  5.6177641 0.9999958
## 6-1  0.98730804  -4.6761846  6.6508006 0.9986478
## 2-4 -0.36056085  -5.0033540  4.2822323 0.9999879
## 3-4 -0.97417525  -6.1974021  4.2490516 0.9980209
## 6-4 -0.32988134  -5.9454188  5.2856561 0.9999977
## 3-2 -0.61361440  -5.1407651  3.9135363 0.9996806
## 6-2  0.03067951  -4.9439818  5.0053408 1.0000000
## 6-3  0.64429391  -4.8760164  6.1646042 0.9998663
  • Distance to 10 neighbors
summary(a_dist_10_neigh)
##              Df Sum Sq Mean Sq F value  Pr(>F)   
## cluster2      6   7913  1318.9   3.191 0.00417 **
## Residuals   974 402620   413.4                   
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
TukeyHSD(a_dist_10_neigh)
##   Tukey multiple comparisons of means
##     95% family-wise confidence level
## 
## Fit: aov(formula = dist_10_neigh ~ cluster2, data = master_class_sum2)
## 
## $cluster2
##           diff         lwr        upr     p adj
## 7-5 -9.6344697 -18.4991323 -0.7698071 0.0230608
## 1-5 -3.6079591 -12.3432802  5.1273619 0.8863472
## 4-5 -1.3846227 -10.0617862  7.2925408 0.9991826
## 2-5 -1.1854643  -9.0966165  6.7256879 0.9994300
## 3-5 -0.6943029  -9.2562997  7.8676938 0.9999844
## 6-5 -1.9677886 -10.9827104  7.0471333 0.9952617
## 1-7  6.0265105  -1.5474910 13.6005121 0.2210762
## 4-7  8.2498470   0.7429947 15.7566993 0.0205661
## 2-7  8.4490054   1.8424939 15.0555168 0.0031684
## 3-7  8.9401668   1.5667379 16.3135956 0.0065568
## 6-7  7.6666811  -0.2281577 15.5615199 0.0635258
## 4-1  2.2233364  -5.1303309  9.5770038 0.9735949
## 2-1  2.4224948  -4.0094244  8.8544140 0.9242927
## 3-1  2.9136562  -4.3037564 10.1310688 0.8969941
## 6-1  1.6401706  -6.1091566  9.3894977 0.9959999
## 2-4  0.1991584  -6.1535511  6.5518679 0.9999999
## 3-4  0.6903198  -6.4565940  7.8372336 0.9999560
## 6-4 -0.5831659  -8.2668762  7.1005444 0.9999894
## 3-2  0.4911614  -5.7033151  6.6856379 0.9999863
## 6-2 -0.7823242  -7.5891272  6.0244787 0.9998778
## 6-3 -1.2734856  -8.8268972  6.2799259 0.9988814
  • Minimal distance to an MG
summary(a_min_MG)
##              Df Sum Sq Mean Sq F value Pr(>F)  
## cluster2      6   2139   356.4   2.502 0.0213 *
## Residuals   586  83489   142.5                 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
TukeyHSD(a_min_MG)
##   Tukey multiple comparisons of means
##     95% family-wise confidence level
## 
## Fit: aov(formula = min_MG ~ cluster2, data = subset(master_class_sum2, !is.na(min_MG)))
## 
## $cluster2
##           diff        lwr       upr     p adj
## 7-5  0.3505170  -6.340404  7.041438 0.9999988
## 1-5 -0.3832598  -6.420999  5.654479 0.9999963
## 4-5 -1.5160284  -8.181438  5.149381 0.9940069
## 2-5 -1.9014326  -7.574864  3.771999 0.9557741
## 3-5 -4.2509543 -10.496643  1.994735 0.4070059
## 6-5  2.2977584  -3.919563  8.515080 0.9299992
## 1-7 -0.7337768  -6.500626  5.033072 0.9997761
## 4-7 -1.8665454  -8.287600  4.554509 0.9781511
## 2-7 -2.2519496  -7.636193  3.132294 0.8792864
## 3-7 -4.6014713 -10.585691  1.382749 0.2579487
## 6-7  1.9472414  -4.007366  7.901849 0.9607289
## 4-1 -1.1327686  -6.869998  4.604461 0.9972483
## 2-1 -1.5181728  -6.065254  3.028908 0.9565741
## 3-1 -3.8676945  -9.111429  1.376040 0.3067384
## 6-1  2.6810182  -2.528896  7.890933 0.7312330
## 2-4 -0.3854042  -5.737912  4.967103 0.9999922
## 3-4 -2.7349259  -8.690608  3.220756 0.8234714
## 6-4  3.8137868  -2.112140  9.739713 0.4783595
## 3-2 -2.3495217  -7.169303  2.470259 0.7785444
## 6-2  4.1991910  -0.583773  8.982155 0.1285495
## 6-3  6.5487127   1.099167 11.998259 0.0074197
  • Minimal distance to an SR101
summary(a_min_SR101)
##              Df Sum Sq Mean Sq F value Pr(>F)  
## cluster2      6   2041   340.1    2.13 0.0492 *
## Residuals   381  60834   159.7                 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
TukeyHSD(a_min_SR101)
##   Tukey multiple comparisons of means
##     95% family-wise confidence level
## 
## Fit: aov(formula = min_SR101 ~ cluster2, data = subset(master_class_sum2, !is.na(min_SR101)))
## 
## $cluster2
##           diff         lwr       upr     p adj
## 7-5 -2.2138531 -11.5286889  7.100983 0.9922923
## 1-5  2.1984583  -8.2429279 12.639845 0.9960137
## 4-5  5.1974167  -3.8840565 14.278890 0.6188840
## 2-5  2.7409704  -6.0133709 11.495312 0.9678883
## 3-5  2.6085417  -6.6482462 11.865330 0.9811190
## 6-5  3.2834311  -7.5667008 14.133563 0.9728941
## 1-7  4.4123115  -3.8492200 12.673843 0.6932885
## 4-7  7.4112699   0.9534442 13.869096 0.0129863
## 2-7  4.9548235  -1.0342371 10.943884 0.1801036
## 3-7  4.8223949  -1.8797300 11.524520 0.3355079
## 6-7  5.4972842  -3.2751566 14.269725 0.5099975
## 4-1  2.9989584  -4.9985346 10.996451 0.9244121
## 2-1  0.5425120  -7.0814783  8.166502 0.9999926
## 3-1  0.4100834  -7.7859436  8.606110 0.9999991
## 6-1  1.0849728  -8.8755543 11.045500 0.9999082
## 2-4 -2.4564464  -8.0756846  3.162792 0.8536983
## 3-4 -2.5888750  -8.9626862  3.784936 0.8924347
## 6-4 -1.9139856 -10.4382281  6.610257 0.9943324
## 3-2 -0.1324286  -6.0308016  5.765944 1.0000000
## 6-2  0.5424607  -7.6323814  8.717303 0.9999951
## 6-3  0.6748894  -8.0358899  9.385669 0.9999877
  • Number of MG
summary(a_n_MG)
##              Df Sum Sq Mean Sq F value  Pr(>F)   
## cluster2      6    174  28.966   2.919 0.00818 **
## Residuals   586   5814   9.922                   
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
TukeyHSD(a_n_MG)
##   Tukey multiple comparisons of means
##     95% family-wise confidence level
## 
## Fit: aov(formula = n_MG ~ cluster2, data = subset(master_class_sum2, !is.na(min_MG)))
## 
## $cluster2
##           diff         lwr        upr     p adj
## 7-5  0.4961538 -1.26952538  2.2618331 0.9816320
## 1-5  0.5061538 -1.08715591  2.0994636 0.9659206
## 4-5  0.6576293 -1.10131763  2.4165761 0.9262042
## 2-5  0.9185223 -0.57864979  2.4156943 0.5382961
## 3-5  1.9003707  0.25218456  3.5485569 0.0122082
## 6-5  0.2638009 -1.37689934  1.9045011 0.9991366
## 1-7  0.0100000 -1.51182411  1.5318241 1.0000000
## 4-7  0.1614754 -1.53298816  1.8559390 0.9999589
## 2-7  0.4223684 -0.99848934  1.8432262 0.9755294
## 3-7  1.4042169 -0.17496970  2.9834034 0.1186677
## 6-7 -0.2323529 -1.80372493  1.3390190 0.9994659
## 4-1  0.1514754 -1.36253240  1.6654832 0.9999452
## 2-1  0.4123684 -0.78756893  1.6123058 0.9501389
## 3-1  1.3942169  0.01043832  2.7779954 0.0469106
## 6-1 -0.2423529 -1.61720667  1.1325008 0.9985438
## 2-4  0.2608930 -1.15158984  1.6733759 0.9981051
## 3-4  1.2427415 -0.32891411  2.8143970 0.2270433
## 6-4 -0.3938284 -1.95763171  1.1699750 0.9896421
## 3-2  0.9818484 -0.29005219  2.2537491 0.2535156
## 6-2 -0.6547214 -1.91690635  0.6074636 0.7237356
## 6-3 -1.6365698 -3.07466034 -0.1984793 0.0141648
  • Number of SR101
summary(a_nSR101)
##              Df Sum Sq Mean Sq F value  Pr(>F)   
## cluster2      6    296   49.31     3.4 0.00279 **
## Residuals   381   5525   14.50                   
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
TukeyHSD(a_nSR101)
##   Tukey multiple comparisons of means
##     95% family-wise confidence level
## 
## Fit: aov(formula = n_SR101 ~ cluster2, data = subset(master_class_sum2, !is.na(n_SR101)))
## 
## $cluster2
##            diff       lwr        upr     p adj
## 7-5  0.49254844 -2.314512  3.2996091 0.9985594
## 1-5 -0.92521994 -4.071771  2.2213309 0.9765467
## 4-5 -2.10242424 -4.839160  0.6343118 0.2578385
## 2-5 -1.51918265 -4.157336  1.1189709 0.6118362
## 3-5 -1.42471591 -4.214284  1.3648518 0.7364154
## 6-5 -1.87062937 -5.140357  1.3990985 0.6192901
## 1-7 -1.41776838 -3.907412  1.0718751 0.6245299
## 4-7 -2.59497268 -4.541063 -0.6488827 0.0017719
## 2-7 -2.01173109 -3.816557 -0.2069052 0.0178879
## 3-7 -1.91726434 -3.936975  0.1024461 0.0754140
## 6-7 -2.36317781 -5.006786  0.2804301 0.1143430
## 4-1 -1.17720430 -3.587279  1.2328702 0.7752505
## 2-1 -0.59396271 -2.891481  1.7035553 0.9879413
## 3-1 -0.49949597 -2.969399  1.9704075 0.9968096
## 6-1 -0.94540943 -3.947052  2.0562327 0.9669323
## 2-4  0.58324159 -1.110137  2.2766201 0.9490331
## 3-4  0.67770833 -1.243064  2.5984802 0.9429015
## 6-4  0.23179487 -2.337018  2.8006073 0.9999697
## 3-2  0.09446674 -1.683030  1.8719635 0.9999987
## 6-2 -0.35144672 -2.814966  2.1120726 0.9995587
## 6-3 -0.44591346 -3.070939  2.1791125 0.9988011
  • Mean distance to BV
summary(a_BV_mean)
##              Df Sum Sq Mean Sq F value Pr(>F)
## cluster2      6    201   33.58    0.96  0.452
## Residuals   625  21870   34.99
TukeyHSD(a_BV_mean)
##   Tukey multiple comparisons of means
##     95% family-wise confidence level
## 
## Fit: aov(formula = BV_mean ~ cluster2, data = subset(master_class_sum2, !is.na(BV_mean)))
## 
## $cluster2
##             diff        lwr      upr     p adj
## 7-5  0.114380955 -2.8875803 3.116342 0.9999998
## 1-5  1.343411715 -1.9266146 4.613438 0.8882539
## 4-5  1.531024067 -1.4533920 4.515440 0.7343193
## 2-5  0.883749247 -1.9221234 3.689622 0.9673766
## 3-5  0.050925195 -3.1611584 3.263009 1.0000000
## 6-5  0.891688746 -2.5906524 4.374030 0.9887070
## 1-7  1.229030760 -1.5019173 3.959979 0.8371332
## 4-7  1.416643112 -0.9648993 3.798186 0.5760417
## 2-7  0.769368292 -1.3842162 2.922953 0.9402255
## 3-7 -0.063455760 -2.7247499 2.597838 1.0000000
## 6-7  0.777307791 -2.2045884 3.759204 0.9875916
## 4-1  0.187612352 -2.5240375 2.899262 0.9999938
## 2-1 -0.459662468 -2.9734697 2.054145 0.9982125
## 3-1 -1.292486520 -4.2528559 1.667883 0.8558759
## 6-1 -0.451722969 -3.7033388 2.799893 0.9996280
## 2-4 -0.647274820 -2.7763342 1.481785 0.9726408
## 3-4 -1.480098872 -4.1215860 1.161388 0.6447799
## 6-4 -0.639335321 -3.6035676 2.324897 0.9955226
## 3-2 -0.832824053 -3.2707814 1.605133 0.9516040
## 6-2  0.007939499 -2.7764554 2.792334 1.0000000
## 6-3  0.840763552 -2.3525756 4.034103 0.9869246
  • Minimal distance to BV
summary(a_BV_min)
##              Df Sum Sq Mean Sq F value Pr(>F)  
## cluster2      6    400   66.73   2.357 0.0293 *
## Residuals   625  17693   28.31                 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
TukeyHSD(a_BV_min)
##   Tukey multiple comparisons of means
##     95% family-wise confidence level
## 
## Fit: aov(formula = BV_min ~ cluster2, data = subset(master_class_sum2, !is.na(BV_min)))
## 
## $cluster2
##            diff         lwr       upr     p adj
## 7-5  1.29094761 -1.40914409 3.9910393 0.7941441
## 1-5  1.74500151 -1.19619930 4.6862023 0.5791495
## 4-5  2.72606132  0.04175054 5.4103721 0.0438061
## 2-5  2.21955044 -0.30417080 4.7432717 0.1272230
## 3-5  1.27030482 -1.61877987 4.1593895 0.8516710
## 6-5  0.60080414 -2.53136168 3.7329700 0.9976621
## 1-7  0.45405390 -2.00227697 2.9103848 0.9981000
## 4-7  1.43511371 -0.70694687 3.5771743 0.4273434
## 2-7  0.92860283 -1.00842274 2.8656284 0.7920614
## 3-7 -0.02064279 -2.41432395 2.3730384 1.0000000
## 6-7 -0.69014347 -3.37218782 1.9919009 0.9884104
## 4-1  0.98105981 -1.45791345 3.4200331 0.8979741
## 2-1  0.47454892 -1.78647630 2.7355741 0.9961461
## 3-1 -0.47469669 -3.13737889 2.1879855 0.9984505
## 6-1 -1.14419737 -4.06883904 1.7804443 0.9096646
## 2-4 -0.50651088 -2.42147752 1.4084558 0.9866053
## 3-4 -1.45575650 -3.83162238 0.9201094 0.5400894
## 6-4 -2.12525718 -4.79141380 0.5408994 0.2185533
## 3-2 -0.94924561 -3.14204820 1.2435570 0.8608218
## 6-2 -1.61874630 -4.12314953 0.8856569 0.4731006
## 6-3 -0.66950068 -3.54172579 2.2027244 0.9931665
  • BV contact
summary(a_BV_contact)
##              Df Sum Sq Mean Sq F value Pr(>F)
## cluster2      6   1.46  0.2426   1.505  0.174
## Residuals   625 100.76  0.1612
TukeyHSD(a_BV_contact)
##   Tukey multiple comparisons of means
##     95% family-wise confidence level
## 
## Fit: aov(formula = BV_contact ~ cluster2, data = subset(master_class_sum2, !is.na(BV_contact)))
## 
## $cluster2
##             diff         lwr        upr     p adj
## 7-5  0.006800382 -0.19695851 0.21055928 0.9999999
## 1-5  0.059229765 -0.16272411 0.28118364 0.9859693
## 4-5  0.091152403 -0.11141560 0.29372041 0.8372108
## 2-5  0.031521050 -0.15892828 0.22197038 0.9989834
## 3-5 -0.062548543 -0.28056955 0.15547246 0.9795994
## 6-5  0.102116602 -0.13424820 0.33848141 0.8619585
## 1-7  0.052429383 -0.13293442 0.23779318 0.9810314
## 4-7  0.084352021 -0.07729578 0.24599983 0.7182519
## 2-7  0.024720668 -0.12145444 0.17089577 0.9988507
## 3-7 -0.069348925 -0.24998495 0.11128710 0.9169368
## 6-7  0.095316220 -0.10708075 0.29771319 0.8056035
## 4-1  0.031922638 -0.15213129 0.21597657 0.9986737
## 2-1 -0.027708715 -0.19833403 0.14291660 0.9990877
## 3-1 -0.121778308 -0.32271414 0.07915753 0.5534977
## 6-1  0.042886837 -0.17781743 0.26359110 0.9974866
## 2-4 -0.059631353 -0.20414181 0.08487910 0.8861264
## 3-4 -0.153700946 -0.33299256 0.02559067 0.1483902
## 6-4  0.010964199 -0.19023383 0.21216223 0.9999985
## 3-2 -0.094069594 -0.25954658 0.07140739 0.6287125
## 6-2  0.070595552 -0.11839597 0.25958707 0.9265977
## 6-3  0.164665146 -0.05208357 0.38141386 0.2719348
  • BV standard deviation (sd)
summary(a_BV_sd)
##              Df Sum Sq Mean Sq F value   Pr(>F)    
## cluster2      6   7.92   1.320   12.23 4.92e-13 ***
## Residuals   625  67.47   0.108                     
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
TukeyHSD(a_BV_sd)
##   Tukey multiple comparisons of means
##     95% family-wise confidence level
## 
## Fit: aov(formula = BV_sd ~ cluster2, data = subset(master_class_sum2, !is.na(BV_min)))
## 
## $cluster2
##             diff         lwr         upr     p adj
## 7-5 -0.243957825 -0.41069736 -0.07721829 0.0003506
## 1-5 -0.167763213 -0.34939202  0.01386560 0.0920777
## 4-5 -0.263551544 -0.42931655 -0.09778653 0.0000645
## 2-5 -0.288589329 -0.44443741 -0.13274125 0.0000013
## 3-5 -0.164469714 -0.34288018  0.01394076 0.0933383
## 6-5  0.070851456 -0.12257007  0.26427298 0.9328777
## 1-7  0.076194612 -0.07549189  0.22788111 0.7533175
## 4-7 -0.019593719 -0.15187300  0.11268556 0.9994615
## 2-7 -0.044631505 -0.16424920  0.07498619 0.9269728
## 3-7  0.079488111 -0.06832956  0.22730579 0.6884603
## 6-7  0.314809280  0.14918423  0.48043433 0.0000006
## 4-1 -0.095788331 -0.24640294  0.05482628 0.4936033
## 2-1 -0.120826117 -0.26045185  0.01879962 0.1403949
## 3-1  0.003293499 -0.16113587  0.16772287 1.0000000
## 6-1  0.238614668  0.05800844  0.41922090 0.0019826
## 2-4 -0.025037785 -0.14329327  0.09321770 0.9959558
## 3-4  0.099081830 -0.04763569  0.24579935 0.4171773
## 6-4  0.334402999  0.16975907  0.49904693 0.0000001
## 3-2  0.124119616 -0.01129315  0.25953238 0.0971041
## 6-2  0.359440785  0.20478566  0.51409591 0.0000000
## 6-3  0.235321169  0.05795183  0.41269050 0.0018590


The output in this section shows a boxplot for each small-scale feature in each behavioral cluster

  • Distance to 3 neighbors
p_dist3_neigth

  • Distance to 10 neighbors
p_dist_10_neigh

  • Minimal distance to an MG
p_min_MG

  • Minimal distance to a SR101
p_min_SR101

  • Number of MG
p_n_MG

  • Number of SR101
p_nSR101

  • Mean distance to BV
p_BV_mean

  • Minimal distance to BV
p_BV_min

  • BV contact
p_BV_contact

  • BV standard deviation (sd)
p_BV_sd

Differences based on large-scale regions

This section analyzes the differences between large-scale regions for small-scale features and provides statistical support in each case.

Code section
#### Differences based on environmental clusters

### plot differences of the values per cluster
master_class_sum <- as.data.frame(master_class_sum)
master_class_sum$cluster2 <- as.character(master_class_sum$cluster2)
df1$mean_movement <- df1$movement
master_class_sum2 <- left_join(master_class_sum, df1[, c("cluster2", "mean_movement")])
master_class_sum2$cluster2 = factor(master_class_sum2$cluster2, levels = df1$cluster2)


#### try an anova between clusters based on distance to dist_10_neigh
p_class_dist_10neigh <- ggplot(master_class_sum2, aes(x = as.factor(class), y = dist_10_neigh, fill = class)) +
    geom_violin(alpha = 1, outlier.colour = NA) + stat_summary(fun = median, geom = "crossbar",
    size = 1, color = "grey25") + scale_fill_manual(values = c("cyan", "gold", "red")) + ggtitle("dist 10 neigh per cluster") +
    theme_classic() + theme(aspect.ratio = 1) + coord_cartesian(ylim = c(20, 70))

a_class_dist_10neigh <- aov(dist_10_neigh ~ class, data = master_class_sum2)



#### try an anova between clusters based on dist to MG
p_class_minMG <- ggplot(subset(master_class_sum2, !is.na(min_MG)), aes(x = as.factor(class), y = min_MG,
    fill = class)) + geom_violin(alpha = 1, outlier.colour = NA) + stat_summary(fun = median, geom = "crossbar",
    size = 1, color = "grey25") + scale_fill_manual(values = c("cyan", "gold", "red")) + ggtitle("distance to closest MG") +
    theme_classic() + theme(aspect.ratio = 1) + coord_cartesian(ylim = c(0, 50))

a_class_minMG <- aov(min_MG ~ class, data = subset(master_class_sum2, !is.na(min_MG)))



#### try an anova between clusters based on dist to SR101
p_classmin_SR101 <- ggplot(subset(master_class_sum2, !is.na(min_SR101)), aes(x = as.factor(class),
    y = min_SR101, fill = class)) + geom_violin(alpha = 1, outlier.colour = NA) + stat_summary(fun = median,
    geom = "crossbar", size = 1, color = "grey25") + scale_fill_manual(values = c("cyan", "gold",
    "red")) + ggtitle("distance to closest SR101") + theme_classic() + theme(aspect.ratio = 1) +
    coord_cartesian(ylim = c(0, 50))

a_classmin_SR101 <- aov(min_SR101 ~ class, data = subset(master_class_sum2, !is.na(min_SR101)))



#### try an anova between clusters based on n of MG
p_class_n_MG <- ggplot(subset(master_class_sum2, !is.na(min_MG)), aes(x = as.factor(class), y = n_MG,
    fill = class)) + geom_violin(alpha = 1, outlier.colour = NA) + stat_summary(fun = median, geom = "crossbar",
    size = 1, color = "grey25") + scale_fill_manual(values = c("cyan", "gold", "red")) + ggtitle("n of MG per cluster") +
    theme_classic() + theme(aspect.ratio = 1)

a_class_n_MG <- aov(n_MG ~ class, data = subset(master_class_sum2, !is.na(min_MG)))



#### try an anova between clusters based on n of SR101
p_classn_SR101 <- ggplot(subset(master_class_sum2, !is.na(min_SR101)), aes(x = as.factor(class),
    y = n_SR101, fill = class)) + geom_violin(alpha = 1, outlier.colour = NA) + stat_summary(fun = median,
    geom = "crossbar", size = 1, color = "grey25") + scale_fill_manual(values = c("cyan", "gold",
    "red")) + ggtitle("n SR101 per cluster") + theme_classic() + theme(aspect.ratio = 1)

a_classn_SR101 <- aov(n_SR101 ~ class, data = subset(master_class_sum2, !is.na(min_SR101)))



#### try an anova between clusters based on BV distance min
p_class_BV_min <- ggplot(subset(master_class_sum2, !is.na(BV_min)), aes(x = as.factor(class), y = BV_min,
    fill = class)) + geom_violin(alpha = 1, outlier.colour = NA) + stat_summary(fun = median, geom = "crossbar",
    size = 1, color = "grey25") + scale_fill_manual(values = c("cyan", "gold", "red")) + ggtitle("min distance to BV") +
    theme_classic() + theme(aspect.ratio = 1) + coord_cartesian(ylim = c(0, 18))

a_class_BV_min <- aov(BV_min ~ class, data = subset(master_class_sum2, !is.na(BV_min)))



#### try an anova between clusters based on BV distance mean
p_class_BV_mean <- ggplot(subset(master_class_sum2, !is.na(BV_mean)), aes(x = as.factor(class),
    y = BV_mean, fill = class)) + geom_violin(alpha = 1, outlier.colour = NA) + stat_summary(fun = median,
    geom = "crossbar", size = 1, color = "grey25") + scale_fill_manual(values = c("cyan", "gold",
    "red")) + ggtitle("mean distance to BV") + theme_classic() + theme(aspect.ratio = 1) + coord_cartesian(ylim = c(0,
    18))

a_class_BV_mean <- aov(BV_mean ~ class, data = subset(master_class_sum2, !is.na(BV_mean)))


## relation between position relative to the tumor and amount of MG:

pcor_MG_position <- ggplot(subset(master_class_sum2, !is.na(min_MG)), aes(x = distance_to_tumor,
    y = n_MG)) + geom_jitter() + geom_point(alpha = 0.5, stat = "summary") + geom_smooth(method = "lm") +
    ggtitle("scatter plot movement vs min_MG") + theme_classic() + theme(aspect.ratio = 1)
cor_18 <- cor.test(master_class_sum2$n_MG, master_class_sum2$distance_to_tumor, method = "pearson",
    use = "pairwise.complete.obs")
See Statistical Information


Statistical information between large-scale regions (environmental clusters) based on ANOVA and Tukey’s HSD (Honestly Significant Difference) test:

  • Distance to 10 neighbors
summary(a_class_dist_10neigh)
##              Df Sum Sq Mean Sq F value Pr(>F)
## class         2   1320   659.9   1.577  0.207
## Residuals   978 409213   418.4
TukeyHSD(a_class_dist_10neigh)
##   Tukey multiple comparisons of means
##     95% family-wise confidence level
## 
## Fit: aov(formula = dist_10_neigh ~ class, data = master_class_sum2)
## 
## $class
##               diff       lwr      upr     p adj
## CL2-CL1 -2.5151366 -6.376437 1.346164 0.2777655
## CL3-CL1 -0.1387336 -3.950374 3.672907 0.9959840
## CL3-CL2  2.3764030 -1.265054 6.017860 0.2764354
  • Minimal distance to an MG
summary(a_class_minMG)
##              Df Sum Sq Mean Sq F value   Pr(>F)    
## class         2   2813  1406.4   10.02 5.26e-05 ***
## Residuals   590  82815   140.4                     
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
TukeyHSD(a_class_minMG)
##   Tukey multiple comparisons of means
##     95% family-wise confidence level
## 
## Fit: aov(formula = min_MG ~ class, data = subset(master_class_sum2, !is.na(min_MG)))
## 
## $class
##              diff       lwr       upr     p adj
## CL2-CL1 -5.075902 -7.941646 -2.210159 0.0001074
## CL3-CL1 -3.773329 -6.454107 -1.092551 0.0028689
## CL3-CL2  1.302574 -1.640317  4.245465 0.5519149
  • Minimal distance to an SR101
summary(a_classmin_SR101)
##              Df Sum Sq Mean Sq F value   Pr(>F)    
## class         2   3438  1719.0   11.13 1.99e-05 ***
## Residuals   385  59436   154.4                     
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
TukeyHSD(a_classmin_SR101)
##   Tukey multiple comparisons of means
##     95% family-wise confidence level
## 
## Fit: aov(formula = min_SR101 ~ class, data = subset(master_class_sum2, !is.na(min_SR101)))
## 
## $class
##              diff        lwr       upr     p adj
## CL2-CL1 -9.024564 -13.534491 -4.514638 0.0000104
## CL3-CL1 -7.309367 -11.897598 -2.721135 0.0006007
## CL3-CL2  1.715198  -1.496459  4.926854 0.4207669
  • Number of MG
summary(a_class_n_MG)
##              Df Sum Sq Mean Sq F value   Pr(>F)    
## class         2    644   322.0   35.55 2.64e-15 ***
## Residuals   590   5344     9.1                     
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
TukeyHSD(a_class_n_MG)
##   Tukey multiple comparisons of means
##     95% family-wise confidence level
## 
## Fit: aov(formula = n_MG ~ class, data = subset(master_class_sum2, !is.na(min_MG)))
## 
## $class
##              diff        lwr        upr     p adj
## CL2-CL1  2.592120  1.8641533  3.3200858 0.0000000
## CL3-CL1  1.337224  0.6562435  2.0182050 0.0000145
## CL3-CL2 -1.254895 -2.0024589 -0.5073317 0.0002641
  • Number of SR101
summary(a_classn_SR101)
##              Df Sum Sq Mean Sq F value   Pr(>F)    
## class         2    422  210.98   15.05 5.11e-07 ***
## Residuals   385   5398   14.02                     
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
TukeyHSD(a_classn_SR101)
##   Tukey multiple comparisons of means
##     95% family-wise confidence level
## 
## Fit: aov(formula = n_SR101 ~ class, data = subset(master_class_sum2, !is.na(min_SR101)))
## 
## $class
##              diff        lwr      upr     p adj
## CL2-CL1 2.6028601  1.2436788 3.962041 0.0000261
## CL3-CL1 3.2029326  1.8201521 4.585713 0.0000003
## CL3-CL2 0.6000725 -0.3678421 1.567987 0.3121099
  • Minimal distance to BV
summary(a_class_BV_min)
##              Df Sum Sq Mean Sq F value   Pr(>F)    
## class         2    591  295.71   10.63 2.89e-05 ***
## Residuals   629  17502   27.82                     
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
TukeyHSD(a_class_BV_min)
##   Tukey multiple comparisons of means
##     95% family-wise confidence level
## 
## Fit: aov(formula = BV_min ~ class, data = subset(master_class_sum2, !is.na(BV_min)))
## 
## $class
##              diff       lwr        upr     p adj
## CL2-CL1 -1.059503 -2.238974  0.1199684 0.0885449
## CL3-CL1 -2.422248 -3.657506 -1.1869908 0.0000148
## CL3-CL2 -1.362746 -2.580036 -0.1454545 0.0237365
  • Mean distance to BV
summary(a_class_BV_mean)
##              Df Sum Sq Mean Sq F value  Pr(>F)   
## class         2    463  231.36   6.735 0.00128 **
## Residuals   629  21609   34.35                   
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
TukeyHSD(a_class_BV_mean)
##   Tukey multiple comparisons of means
##     95% family-wise confidence level
## 
## Fit: aov(formula = BV_mean ~ class, data = subset(master_class_sum2, !is.na(BV_mean)))
## 
## $class
##               diff       lwr         upr     p adj
## CL2-CL1 -1.3233714 -2.633946 -0.01279692 0.0471867
## CL3-CL1 -2.1065180 -3.479080 -0.73395634 0.0009792
## CL3-CL2 -0.7831466 -2.135745  0.56945151 0.3625740


Statistical information based on correlation test between position relative to the tumor and amount of MG:

cor_18
## 
##  Pearson's product-moment correlation
## 
## data:  master_class_sum2$n_MG and master_class_sum2$distance_to_tumor
## t = -0.14404, df = 591, p-value = 0.8855
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  -0.08639938  0.07462654
## sample estimates:
##          cor 
## -0.005924827

The output in this section shows a boxplot for each feature in each behavioral cluster according to large-scale regions:

  • Distance to 10 neighbors
p_class_dist_10neigh

  • Minimal distance to an MG
p_class_minMG

  • Minimal distance to an SR101
p_classmin_SR101

  • Number of MG
p_class_n_MG

  • Number of SR101
p_classn_SR101

  • Minimal distance to BV
p_class_BV_min

  • Mean distance to BV
p_class_BV_mean



Session Information
## R version 4.3.3 (2024-02-29 ucrt)
## Platform: x86_64-w64-mingw32/x64 (64-bit)
## Running under: Windows 10 x64 (build 19045)
## 
## Matrix products: default
## 
## 
## Random number generation:
##  RNG:     L'Ecuyer-CMRG 
##  Normal:  Inversion 
##  Sample:  Rejection 
##  
## locale:
## [1] LC_COLLATE=Spanish_Spain.utf8  LC_CTYPE=Spanish_Spain.utf8   
## [3] LC_MONETARY=Spanish_Spain.utf8 LC_NUMERIC=C                  
## [5] LC_TIME=Spanish_Spain.utf8    
## 
## time zone: Europe/Madrid
## tzcode source: internal
## 
## attached base packages:
## [1] parallel  stats     graphics  grDevices utils     datasets  methods  
## [8] base     
## 
## other attached packages:
##  [1] zoo_1.8-12             viridis_0.6.5          viridisLite_0.4.2     
##  [4] umap_0.2.10.0          tidyr_1.3.1            spatstat_3.0-8        
##  [7] spatstat.linnet_3.1-5  spatstat.model_3.2-11  rpart_4.1.23          
## [10] spatstat.explore_3.2-7 nlme_3.1-164           spatstat.random_3.2-3 
## [13] spatstat.geom_3.2-9    spatstat.data_3.0-4    sp_2.1-4              
## [16] scales_1.3.0           reshape2_1.4.4         readr_2.1.5           
## [19] RANN_2.6.1             plotly_4.10.4          pheatmap_1.0.12       
## [22] gtools_3.9.5           ggplot2_3.5.1          emmeans_1.10.2        
## [25] e1071_1.7-14           dtwclust_5.5.12        dtw_1.23-1            
## [28] proxy_0.4-27           dplyr_1.1.4            DT_0.33               
## 
## loaded via a namespace (and not attached):
##  [1] deldir_2.0-4          gridExtra_2.3         formatR_1.14         
##  [4] rlang_1.1.3           magrittr_2.0.3        clue_0.3-65          
##  [7] flexclust_1.4-2       compiler_4.3.3        mgcv_1.9-1           
## [10] png_0.1-8             vctrs_0.6.5           stringr_1.5.1        
## [13] crayon_1.5.2          pkgconfig_2.0.3       fastmap_1.2.0        
## [16] labeling_0.4.3        utf8_1.2.4            promises_1.3.0       
## [19] rmarkdown_2.27        tzdb_0.4.0            bit_4.0.5            
## [22] purrr_1.0.2           xfun_0.44             modeltools_0.2-23    
## [25] cachem_1.1.0          jsonlite_1.8.8        goftest_1.2-3        
## [28] highr_0.11            later_1.3.2           spatstat.utils_3.0-4 
## [31] cluster_2.1.6         R6_2.5.1              bslib_0.7.0          
## [34] stringi_1.8.4         RColorBrewer_1.1-3    reticulate_1.37.0    
## [37] jquerylib_0.1.4       estimability_1.5.1    Rcpp_1.0.12          
## [40] iterators_1.0.14      knitr_1.47            tensor_1.5           
## [43] splines_4.3.3         httpuv_1.6.15         Matrix_1.6-5         
## [46] tidyselect_1.2.1      rstudioapi_0.16.0     abind_1.4-5          
## [49] yaml_2.3.8            codetools_0.2-20      lattice_0.22-6       
## [52] tibble_3.2.1          plyr_1.8.9            shiny_1.8.1.1        
## [55] withr_3.0.0           askpass_1.2.0         evaluate_0.23        
## [58] RcppParallel_5.1.7    polyclip_1.10-6       pillar_1.9.0         
## [61] foreach_1.5.2         stats4_4.3.3          shinyjs_2.1.0        
## [64] generics_0.1.3        vroom_1.6.5           hms_1.1.3            
## [67] munsell_0.5.1         xtable_1.8-4          class_7.3-22         
## [70] glue_1.7.0            lazyeval_0.2.2        tools_4.3.3          
## [73] data.table_1.15.4     RSpectra_0.16-1       mvtnorm_1.2-5        
## [76] grid_4.3.3            crosstalk_1.2.1       colorspace_2.1-0     
## [79] cli_3.6.2             spatstat.sparse_3.0-3 fansi_1.0.6          
## [82] gtable_0.3.5          sass_0.4.9            digest_0.6.35        
## [85] ggrepel_0.9.5         farver_2.1.2          htmlwidgets_1.6.4    
## [88] htmltools_0.5.8       lifecycle_1.0.4       httr_1.4.7           
## [91] mime_0.12             bit64_4.0.5           openssl_2.2.0


Thank you for reading the BEHAV3D Tumor Profiler - Guided Tutorial.